libera/#sbcl - IRC Chatlog

22:51:45 hayley I'm fairly sure SIMD would be effective (for more or less finding a particular byte in a haystack). More that, if I have to use it, am I doing something wrong in design?

22:54:09 hayley Arguably it's quite silly to be walking a vector 1/128 the size of the heap all the time, and needing SIMD acceleration is going to make the performance fix unportable between architectures. (Could SWAR like there is for card marks, still.)

23:14:26 mfiano I was mostly just pointing out that you could see which CPU resources are getting hit right now with the scalar version.

23:14:50 mfiano It might make assessing the situation easier and get an overview of how well it can be vectorized

23:14:59 hayley Okay then, thanks.

23:46:55 stassats don't you just want memchr then?

23:47:39 stassats i would assume the gc is too branchy for simd

23:50:10 hayley Now that you mention it, but not quite. I have to check for two bytes rather: the same gen and marked case, and the same gen and unmarked case. And then I do a sort of blending operation to sweep allocation bits, which would be nice with SIMD (though...I don't see why a compiler can't vectorise that, if I make the loop more obvious).

23:51:18 hayley Perhaps better to separate my one big loop into a few to clue the compiler in. And one pass could just use memchr, sure.

23:51:19 stassats have you instructed the compiler to vectorize?

23:51:46 hayley I believe GCC is supposed to on -O3, but for SSE2 only.

2:12:58 hayley No luck with vectorising e.g. https://godbolt.org/z/PKrWz8bbK Clang unrolls, but both still operate on each byte.

2:53:57 mfiano Yeah, that isn't going to work with a ternary.

2:54:30 mfiano I know LLVM has a way to do branchless conditions, but I don't know how it's implemented

3:41:44 hayley Using if (...) a[n] = b[n] doesn't work either. Wrote a manually vectorised version of it, but it's much less pretty (due to intrinsic names, mostly) and of course less portable, which is what concerns me.

4:36:25 hineios5 ** NICK hineios

8:00:03 hayley Seems neither likes conditionals at all, even though vectorising them seems simple. "if (a[n] == g) a[n]++;" (which is part of my code for promoting generations) also is entirely scalar, and takes a lot of time.