libera/#sbcl - IRC Chatlog

17:35:04 z0ltan Hello folks, running sbcl 2.1.9 (locally built) on macOS 10.14 (Mojave). Invoking `sbcl` on the terminal takes almost 30 seconds to load (used to be a couple of seconds some time back). Is this a known thing, or is there something I can check locally?

17:47:06 mishugana Hello folks, using SBCL 2.1.9.83-c0a8e0749 on macOS Mojave (10.14). Starting sbcl on the terminal takes around 15-20 seconds now (used to take just a couple before, last tested around a year back). Is this a known issue, or is there something I can test out locally? Anybody else face the same issue?

17:47:24 mishugana (sbcl is built from source locally)

17:48:37 yitzi Is that loading quicklisp included? Great nick btw.

17:49:00 mishugana yitzi: Yes, quicklisp is included. Also, thanks for the compliment - shows my state of mind of late :D

17:49:33 mishugana This also probably (maybe?) explains why loading SLIME takes around 5-10 seconds.

17:49:40 yitzi Did you try with `--no-userinit`?

17:49:47 mishugana yitzi: lemme try that

17:49:59 yitzi Just to see how long it takes without quicklisp

17:50:06 mishugana yup, instantaneous!

17:50:19 mishugana is it an issue with quicklisp then?

17:51:13 yitzi No idea. I know it takes a while, but I am not on a Mac so haven't got a clue there. Maybe check to make sure the quicklisp client and library is updated?

17:51:58 mishugana yitzi: I could try that out. This alone helps out massively though, so thank you so much! :-)

17:52:15 mishugana Will try installing quicklisp again and try ... cheers!

17:52:39 yitzi I think you can update from inside SBCL without reinstalling.

17:53:13 yitzi `(ql:update-client)` and then `(ql:update-all-dists)`

17:53:33 mishugana yitzi: Okay, I'll try that out first then. Thank you! :-)

18:00:43 stassats the first time?

18:02:00 stassats on the first launch it might need to recompile everything

18:02:16 stassats also macos does some hashing on the first launch (although maybe not mojave)

21:08:06 stassats how come non-tail calls are significantly slower

21:12:18 stassats i barely see any difference between not calling anything and tail calling on M1, but a normal call is really slow

21:12:33 stassats similar things on x86-64, although i do see a difference between not calling and tail-call

21:12:54 stassats so, what prediction hardware are we defeating

21:14:12 edgar-rft ** NICK all

21:14:21 all ** NICK Guest5617

21:14:46 Guest5617 ** NICK edgar-rft

21:21:55 stassats i guess i need to check what clang does before chasing geese

22:26:27 stassats having STR CFP, [CSP]; LDR CFP, [CSP] seems to slow things down

22:36:14 john-a-carroll stassats: thanks for looking at function calls on M1! A little while ago I tried the tak benchmark and was surprised that M1 was slower than x86-64. (However I thought nothing further of it when I found that the M1 version of my system ran so much faster overall)

22:36:42 stassats john-a-carroll: how little?

22:36:58 stassats because m1 function calls are faster for me

22:38:11 stassats anyway, the thing i'm seeing is equally slow on x86-64 and arm64

22:40:03 stassats maybe LDR CFP, [CSP]; LDR CFP, [CFP] is the slow bit, it can't prefetch

22:40:26 stassats although x86-64 uses the same calling convention as C, surely it should be able to

22:40:31 stassats or it's not really the same then

22:43:42 john-a-carroll this was in one of the first m1 releases: 2.1.5 comparing the m1 native version to x86_64 under Rosetta 2. (time (dotimes (n 10000) (tak 18 12 6))) -> 2.4 secs in emulated x86_64, 4.7 seconds in native m1

22:43:57 stassats oh yeah, that's too old

22:44:29 john-a-carroll Ah, OK

22:47:21 stassats currently, rosetta 7.835, m1 5.588

22:51:59 stassats using a different stack is probably not going to help the hardware

22:52:25 stassats i need to concoct a test that uses the C stack

22:53:23 john-a-carroll looks good. I'm still with 2.1.5, and for my system's main benchmark the figures are rosetta 16m46s, m1 15m24s

23:03:18 aeth_ ** NICK aeth

1:10:48 stassats loading and storing the same register on the stack in quick succession doesn't seem to be great

1:11:17 stassats i guess there's no way around that, except for making leaf functions not touch the stack

1:12:08 stassats and supposedly in normal code there's something between function calls and returns

1:14:12 stassats so even spilling an iteration variable onto the stack isn't great

1:19:47 stassats if insert five division instructions between str/ldr, then they do not matter

1:19:57 stassats so, small functions are bad

1:20:15 stassats they -- ldr/str

1:25:15 stassats and it means it's hard to measure performance between different routines in a loop

1:26:35 stassats i mean, with different out of order paths it's always difficult, but here the call/return/save the iteration variable just dominate the computation being measured