Search
7:31:13
terrorjack4
** NICK terrorjack
9:22:56
stassats
looks like i can test for signed-byte-32 by sign extending and comparing with the original value
9:23:17
stassats
which is just a single instruction on arm64, CMP NL2, NL2, SXTW
9:33:37
stassats
x86-64 needs two instructions, MOVSX and CMP, but it's still more compact than comparing against two numbers
9:33:55
stassats
even gcc doesn't know about that trick
9:34:01
stassats
i hope i didn't mess up the math, though
9:36:50
stassats
can be extended to 8-bit and 16-bit, but i wonder if any bit can be used by shifting first
9:48:16
stassats`
except sxtb seems to be slower than sxtw
10:11:47
stassats`
huh, it becomes slow when the input exceeds signed-byte-32
10:11:56
stassats`
like it can't predict a branch
10:13:05
stassats`
(loop for i from from to to do (setf z (= j 0))) seems to be twice as fast when J is 1
10:18:35
stassats`
true on another arm64 cpu
10:18:54
stassats`
slight difference on an POWER9 cpu, no difference on x86-64
10:45:29
stassats`
i guess that's because of CMOV
10:45:43
stassats`
gotta implement CMOV for arm64 then
15:33:10
stassats
it was pretty easy, probably around an hour
15:33:19
stassats
having three operand instructions is nice
15:41:01
stassats
gcc/clang are pretty aggressive with cmov, like computing two different things and then doing a cmov
15:44:11
stassats
clang simultaneously computes div and mul, and but not two divs, gcc doesn't with one div, but two muls are ok
15:44:47
stassats
now implementing a thing like that is tricky, and tracking when and where it's a good idea is annoying