libera/#clasp - IRC Chatlog
Search
10:47:46
karlosz
just changing int32_t to int16_t was enough to convince the compiler to generate better code for read_s6
12:59:34
drmeister
yitzi: Can you get a new homebrew build running that will work? I need to install it on some machines today - this is for the main branch
13:04:34
drmeister
Last night I realized that if we measure the count of instructions for a call then we can use the `reverse-stepi <num-instructions>` and `stepi <num-instructions>` to quickly determine if that is still the case. Then you can systematically adjust <num-instructions> until you land exactly where you want to and get a new count of instructions.
13:25:17
yitzi
I think I can simplify clasp-builder by making the system list a list of plists, i.e `((:file #P"file" :position 20 :load-time 23) ...)` versus `(#P"file" ...)` and a whole bunch of hash tables.
13:30:32
drmeister
Uh huh. I changed the name of header stamp stuff to make it more sane. Did you `./koga --clean`
13:34:06
drmeister
I changed the name of some Header_s fields to better reflect what is stored in them. I added stuff for the badge. I think you have guards on?
13:35:15
drmeister
Object headers are a 64bit value that contains a badge, a stamp, a wtag and an mtag.
13:36:38
drmeister
I implemented the idea of deferring the badge calculation until an object is put in a hash table - it appears to work fine.
13:37:33
drmeister
The badge value is now initialized to 0x1 everywhere and if you put something into a hash table it notices that and only then does it calculate a badge that is used for hashing from then on.
13:38:54
drmeister
Snapshot save/load saves and loads these badges and when you load a snapshot it loads the hash tables and it doesn't need to rehash because the badges determine the hash.
13:39:25
drmeister
I could save a more compact hash-table in the snapshot and set a `dirty` flag and then when we load from snapshot it rehashes...
14:56:13
drmeister
yitzi: When you get back - the latest homebrew build of cando has just jumped to llvm15
14:58:02
drmeister
I don't know if cando builds against llvm15 yet. It should be a lot easier than the llvm14 transition
14:59:20
Bike
oh, i just realized what the problem is, the stack pointer still needs to be reset when we return, obviously
14:59:31
karlosz
im trying to basically load the pc value on entry to bytecode vm, stashing it before bytecode_vm calls and restoring it right before any C++ return statements
15:00:08
Bike
maybe the stack pointer could also be function local, but failing that we can just reset it before returning, maybe
15:01:39
Bike
the strategy is pretty simple other than this, basically taking the fp out of the vm state and making it a parameter/auto variable everywhere instead
15:03:09
karlosz
im approaching it somewhat differently - just using the vm._pc as a passing location
15:10:09
Bike
ech, and it does need to reset in an unwind protect y way, since we could unwind from a native function through a bytecode function to a native function
15:31:57
yitzi
drmeister: the build is pinned to llvm@14 so it should download llvm14 and not build against llvm@15
15:33:26
drmeister
It looks like llvm@15 landed this morning - and that just proves the universe hates me.
15:35:17
yitzi
drmeister: Maybe try `homebrew install llvm@14` If that doesn't work I can force it to rebuild. Maybe they symlink llvm14 in 15.
15:39:03
yitzi
drmeister: Yes, they pushed about 10 hours ago. That means we probably need to rebuild the homebrew bottles since 14 was the default before.
15:41:04
drmeister
That is the list of libraries that cando expects. It's using /usr/local/opt/llvm - that is going to be symlinked to whatever homebrew thinks is the latest.
15:43:19
drmeister
yitzi: I think this `/usr/local/opt/llvm/lib/libLLVM.dylib (compatibility version 1.0.0, current version 14.0.6)` means it was built against 14.0.6?
15:43:41
drmeister
And now my /usr/local/opt/llvm points to llvm@15 and that is missing a symbol that 14.0.6 had.
15:45:39
drmeister
If we hardwire to use `/usr/local/opt/llvm@14` and rebuild then it will always go for llvm@14 on our systems.
15:46:01
yitzi
When I rerun the build on a system with llvm15 present it will link to llvm14's actual location
15:46:41
yitzi
the location is gotten from llvm-config so it will return the correct value for the updated system with both versions present.
15:47:50
frgo
This seems not to be flexible enough. On my ARM-based Mac I had to install homebrew from scratch and homebrew not sits in /opt/homebrew - with llm being at /opt/homebrew/opt/llvm ...
15:48:38
drmeister
yitzi: I tested the idea using: `install_name_tool -change /usr/local/opt/llvm/lib/libLLVM.dylib /usr/local/opt/llvm@14/lib/libLLVM.dylib /usr/local/Cellar/cando/1.0.0-529-gc6c8ae3c6-g7371ab99/bin/iclasp-boehmprecise`
15:49:07
drmeister
That changes the path in the executable to hardwire it to `/usr/local/opt/llvm@14` - now it works.
15:49:31
drmeister
frgo: On an M1 Mac you need to use x86 terminal/shell and install homebrew from there.
15:50:00
drmeister
frgo: Sorry about that - but llvm is still missing features that would allow us to build M1 native clasp.
15:50:27
drmeister
frgo: Have you heard from kpoeck lately? I've been emailing him but have gotten no response. I'm a bit worried.
15:51:33
drmeister
frgo: Clasp is totally ready to support ARM/M1 - the problem is the llvm LLJIT doesn't handle unwinding properly yet.
15:53:05
drmeister
frgo: I just asked on the discord server for llvm/#jit what the status is of M1 support for LLJIT.
15:54:34
drmeister
frgo: The Apple engineer that is developing LLJIT is someone I know. He's been working on it off and on over the last couple of months.
15:56:25
drmeister
Buuuuut - any fixes like that usually need wait for an official llvm release. Unless he fixed it weeks ago and it's in llvm@15 it's going to be a wait until llvm@16
15:57:04
drmeister
I've been doing everything under x86 emulation with Rosetta2 - it's totally painless once you set things up that you only use x86 terminals.
15:57:42
drmeister
Running under x86/Rosetta2 is a contagion. Anything you execute from an x86 shell/terminal is x86.
15:59:10
drmeister
There will probably ALSO be a problem with the fmt library: `-L/usr/local/Cellar/fmt/9.1.0/lib`
15:59:54
drmeister
Nope, that will probably be ok. That was a problem yesterday - but not today for some reason.
16:00:27
drmeister
yitzi: Did you suggest a while ago that we LOAD everything with bytecode compilation?
16:01:51
Bike
we could have bytecode call god knows what, which then calls bytecode, and that will need to know where the stack pointer is so it can orient itself, so i guess the stack pointer does sort of need to be "global"
16:02:18
Bike
saving the stack pointer around calls doesn't seem to obviously be terribly slow, either... i guess it might be fine
16:03:11
Bike
so then making it a local would just be like the pc stuff, in that it's kind of a microoptimization... might still matter of course
16:11:16
yitzi
I did suggest loading via bytecode. It might be useful to control via a dynamic variable.
16:12:24
drmeister
I think it might be a good idea - there may be issues with the inlining code in cleavir.
16:23:53
Bike
the fancy locally notinline thing in the error compiler macro (a) works in the bytecode compiler now and (b) is unnecessary if error has a properly defined type, so i'm gonna edit and reenable it
16:24:16
Bike
then i'm thinking of moving the type proclamations in inline.lisp to a new file and putting that back into the build
16:28:12
drmeister
I think I figured out how to profile code running in the VM - to put that code on the same level as native code.
16:29:13
drmeister
Then when profiling we would need to swap backtrace IP addresses that are in the bytecode_vm function with VM._pc addresses.
16:30:54
drmeister
If the VM is perfectly efficient then we wouldn't need to profile it. We would want to profile the stuff running in it.
16:35:50
drmeister
I learned something about thermodynamics yesterday and the "ergodic hypothesis". Fruit flies are fundamentally different from gas molecules.
16:36:46
drmeister
This device can trap fruit flies - but if it could trap gas molecules then it would violate laws of thermodynamics.
16:43:36
Bike
if we make the virtual machine use enough instructions, maybe it will hit the thermodynamic limit and we can just use statistical physics instead of all this "computer science" nonsense
16:58:23
drmeister
DEBUG_DRAG_CXX_CALLS, DEBUG_DRAG_NATIVE_CALLS, DEBUG_DRAG_INTERPRET_DTREE, DEBUG_DRAG_CONS_ALLOCATION, DEBUG_DRAG_GENERAL_ALLOCATION
16:58:41
yitzi
Bike: we could make a thermodynamic ensemble with 10,000 virtual machines in the same box with a different state that can't see each other.
16:59:27
drmeister
DEBUG_DRAG_NATIVE_CALLS isn't implemented yet - we need to insert a call to `extern "C" void drag_native_calls();` in every xep function.
17:00:12
drmeister
If you add one unit of DRAG to one of these things then it adds about 10 machine instructions/unit-drag.
17:00:36
drmeister
If adding one unit of DRAG doesn't impact something important - then it's not worth optimizing out 10 instructions.
17:00:39
karlosz
drmeister: what exactly are the instructions inserted? not all machine instructions are created equal (in terms of cycles etc)
17:01:37
drmeister
The idea is whatever I do here - it's only a rough comparison to anything we might optimize away.
17:05:18
drmeister
If adding one unit of DRAG to a thing doesn't impact performance then optimizing away the equivalent amount of machine instructions from that thing won't matter.
17:14:50
karlosz
Bike: i suppose it made sense to put the fp as a local variable because the fp is constant for the duration of a bytecode call frame
17:15:25
karlosz
but still it should be possible to do it funciton locally and pass it around through the vm struct at the boundaries
17:34:53
drmeister
We need to look carefully at the tagging stuff. I'm thinking that some of it is an artifact of how C++ debug info works.
17:35:40
drmeister
I added gc::As_assert<Foo_sp>(x) <- this is a cast if DEBUG_ASSERT is not defined and a type check if DEBUG_ASSERT is defined.
17:36:42
drmeister
Here is a video stepping through the 437 instructions in the forward direction...
17:36:42
drmeister
https://www.dropbox.com/s/ffcyzkx8rm23d02/Screen%20Recording%202022-09-19%20at%201.26.58%20PM.mov?dl=0
17:46:32
drmeister
Now, after compile-file-serial of predlib.lisp a couple of times to settle down the GF dispatch compilation ...
17:52:07
karlosz
drmeister, Bike: i got PC-as-a-local-variable to build, so that should shave another ocuple instructions
17:52:25
Bike
this might slow down build again, since it can insert type declarations in safe code... hopefully not too bad though...
18:02:17
Bike
what's the easiest way to see how long the overall build takes. can i just wrap ninja in time
18:09:44
Bike
not loading inline.lisp also means we won't save new inline definitions, so the compile time will also be affected by that, in addition to the actually applying the inline
18:17:27
drmeister
I have to compile-file-serial predlib.lisp a couple of times to get the timing to settle down.
18:19:54
Bike
i can put in something to force all the dispatchers to compile if you want. actually i feel like i already did that at some point but it went out of use.
18:22:09
drmeister
Just exposed cxx functions that have non-trivial lambda-lists - ones with bytecode wrappers
18:28:48
Bike
says that arithmetic functions return numbers, ERROR doesn't return, that kind of thing
18:29:15
drmeister
With auto-compile.lisp disabled cleavir won't run - or did you move that into proclamations?
18:32:06
Bike
oh, wait, do you mean that if it's before auto compile the type procalmations won't get saved
18:32:41
drmeister
I don't know about that. auto-compile.lisp is where cleavir turns on. inline-prep.lisp is where cleavir inlining gets prepared for.
18:51:24
drmeister
I'm building code and something to do with single stepping (I think) is messing up.
18:53:20
drmeister
https://github.com/clasp-developers/clasp/blob/main/src/lisp/kernel/cleavir/translate.lisp#L615
18:58:00
Bike
...and what exactly the index is of, like, what's the source position that's being dumped
18:58:49
drmeister
https://github.com/clasp-developers/clasp/blob/main/src/lisp/kernel/cleavir/translate.lisp#L616
19:05:29
Bike
that either means that the cst is completely messed up, or that we're inserting a step before a literal constant, which would be silly
19:05:46
drmeister
What should I trace to get to the bottom. I have a compile-file-serial that is reproducibly screwing this up.
19:22:49
Bike
what's happening here is we have a call instruction, and for some reason its source is a literal zero. that seems messed up.
19:23:07
Bike
maybe you can check the cst:source of the origin to see exactly which 0 it is? i don't know. this is pretty bizarre.
19:37:54
Bike
either declare debug to be something less than 3, or just (declare (optimize (clasp-cleavir::insert-step-conditions nil))) should do it i think
19:56:01
yitzi
At some point some of the cleavir files (auto-compile, inline, etc) got moved out of the clasp-cleavir.asd and put into the system list explicitly. At point we should move them back to the asd and let koga grovel them as it does with the rest of clasp-cleavir.
20:45:41
drmeister
I tried to push like three times and each time got distracted by someone elses push and then a distraction here.
20:49:27
drmeister
Little things like As_assert give me a visceral thrill to use - it's a clever little thing that keeps the code clean and adds safety with no release runtime cost.
20:54:28
drmeister
yitzi: Commenting on something you said earlier on the phone. ECL does have (sys:install-bytecode-compiler) - that installs the bytecode compiler as the default compiler.
20:55:41
drmeister
We should be able to put it under control of a dynamic variable once the bytecode compiler can do anything that it is missing.
20:56:18
drmeister
There is some fussy stuff like some compiler macros that cleavir can use, the bytecode compiler cannot.
20:58:04
drmeister
I'd be all for LOADing everything with the bytecode compiler and then COMPILE-FILEing things with Cleavir.
20:58:37
drmeister
The only hitch is that the inlining code may get really bogged down if its running in bytecode.
21:18:40
drmeister
Bike: I'm going to stick a call to extern "C" drag_native_calls() right at the top of every generated function IFF sys:*drag-native-calls* is T.
21:27:46
drmeister
That let's us slow all compiled functions by about 20 instructions for each drag unit.
21:39:10
drmeister
drag = 0. Time real(8.470 secs) run(8.470 secs) consed(2752278536 bytes) interps(2186) unwinds(0)
21:39:41
drmeister
drag = 100. Time real(12.079 secs) run(12.079 secs) consed(2749859640 bytes) interps(2182) unwinds(0)
21:40:19
drmeister
drag = 0. Time real(8.473 secs) run(8.473 secs) consed(2739291616 bytes) interps(2178) unwinds(0)
21:40:32
drmeister
drag = 100. Time real(10.581 secs) run(10.581 secs) consed(2754494432 bytes) interps(2182) unwinds(0)
21:41:16
drmeister
drag = 0. Time real(8.552 secs) run(8.552 secs) consed(2755525056 bytes) interps(2186) unwinds(0)
21:41:51
drmeister
drag = 10. Time real(14.150 secs) run(14.150 secs) consed(2754744056 bytes) interps(2186) unwinds(0)
21:42:50
drmeister
drag = 0. Time real(8.555 secs) run(8.555 secs) consed(2755360640 bytes) interps(2186) unwinds(0)
21:43:33
drmeister
drag = 100. Time real(24.954 secs) run(24.954 secs) consed(2740699464 bytes) interps(2178) unwinds(0)
21:44:36
drmeister
So in order of importance for speeding things up... cxx-call wrappers, general-allocation, cons-allocation, native-calls
21:46:09
drmeister
Bike: What was the bytecompile that you tried to do early when clasp was starting up that gave you a mysterious problem?
21:47:19
drmeister
I want to look at that now - because it feeds into my idea to compile the wrappers.
21:47:22
Bike
cmprepl-bytecode does (setq *implicit-compile-hook* 'bytecode-implicit-compile-repl-form)
21:47:37
Bike
i got the error when i changed it to (setq *implicit-compile-hook* 'bytecode-implicit-compile-form)