libera/#sbcl - IRC Chatlog

13:37:44 hayley Is it likely that I am digging myself into a hole if my next thought on speeding up sweeping in my GC is to use SIMD? I don't think there has been much use of intrinsics e.g. before in the runtime, which makes me nervous. And perhaps it is a sign of a bad algorithm, if scalar code performs poorly.

14:56:26 mfiano hayley: Have you tried running the machine code through llvm-mce? or uica which should be more accurate

14:58:55 mfiano I found it very useful for debugging why things are slow. In my recent use, I could see that the SIMD version actually uses more instructions, and executes at fewer instructions per cycle than the scalar version.

14:59:24 mfiano assembly I got targeting skylake-avx512, but analyzed under the zen2 cost model: https://gist.github.com/mfiano/3377a7e7804f279eaa9478f88062e858

14:59:48 mfiano Basically, resource [8], the Zen2FPU0 (first Zen2 floating point unit) is hit extremely hard here. The tightest bottleneck determines the speed, so one resource getting hit really hard, much harder than the others, is bad.

15:04:49 NotThatRPG_away ** NICK NotThatRPG

15:54:10 NotThatRPG ** NICK NotThatRPG_away