libera/#sbcl - IRC Chatlog
Search
22:51:45
hayley
I'm fairly sure SIMD would be effective (for more or less finding a particular byte in a haystack). More that, if I have to use it, am I doing something wrong in design?
22:54:09
hayley
Arguably it's quite silly to be walking a vector 1/128 the size of the heap all the time, and needing SIMD acceleration is going to make the performance fix unportable between architectures. (Could SWAR like there is for card marks, still.)
23:14:26
mfiano
I was mostly just pointing out that you could see which CPU resources are getting hit right now with the scalar version.
23:14:50
mfiano
It might make assessing the situation easier and get an overview of how well it can be vectorized
23:50:10
hayley
Now that you mention it, but not quite. I have to check for two bytes rather: the same gen and marked case, and the same gen and unmarked case. And then I do a sort of blending operation to sweep allocation bits, which would be nice with SIMD (though...I don't see why a compiler can't vectorise that, if I make the loop more obvious).
23:51:18
hayley
Perhaps better to separate my one big loop into a few to clue the compiler in. And one pass could just use memchr, sure.
2:12:58
hayley
No luck with vectorising e.g. https://godbolt.org/z/PKrWz8bbK Clang unrolls, but both still operate on each byte.
2:54:30
mfiano
I know LLVM has a way to do branchless conditions, but I don't know how it's implemented
3:41:44
hayley
Using if (...) a[n] = b[n] doesn't work either. Wrote a manually vectorised version of it, but it's much less pretty (due to intrinsic names, mostly) and of course less portable, which is what concerns me.
8:00:03
hayley
Seems neither likes conditionals at all, even though vectorising them seems simple. "if (a[n] == g) a[n]++;" (which is part of my code for promoting generations) also is entirely scalar, and takes a lot of time.