14:06:53Bikesub byte arrays are kind of difficult because the gcbitarray constructor works with words, not elements... and that kind of ties into how all the gcarray constructors are... but that's deep stuff to change
14:21:45Bikeusing clz instead of checking each bit individually, for (position 1 simple-bit-vector), is like two hundred times faster, so that's kinda cool
14:36:43beachYes, sub-byte arrays are problematic, but tricks like that are worth the trouble.
14:40:39BikeIf a bit vector happens to be a multiple of 64/whatever long, it might be easier to reverse, too. apparently some architectures have bit reverse instructions
14:40:54Bikenot x86, but llvm can still put in a clever series of rotations and stuff for you
14:48:47beachInteresting. I hadn't considered the reversal thing. In the past I would have done a 256-element table to apply for each byte, but nowadays I am not so sure.
14:54:40Bikeyeah, gotta worry about the memory latency
15:29:25BikeThe usual on x86 seems to be a loop of shifts. Shifting left puts the former MSB in the carry flag, and then you can use "add x, x" to both shift the result left and put the carry flag in the LSB.