freenode/#clasp - IRC Chatlog
Search
15:43:54
Bike
i have ext:source-location returning what it ought to, but if i try M-. i get "Containing expression ends prematurely"...
16:17:36
Bike
drmeister: doing peek-char seems to update the result of core:input-stream-source-pos-info. I am confused.
17:16:52
drmeister
Continuing the conversation from yesterday (because I need to tell the Ravenbrook people something).
17:17:25
drmeister
I can't figure out how we use core dumping to build Clasp. We don't GC code - so we can't dump the core.
17:22:50
drmeister
So we JIT code into particular memory locations and then dump that along with the memory? For cclasp - we dump the core at the end of loading inline.lisp rather than compile-filing everything.
17:24:08
drmeister
We have C++ memory that we would need to fix up when we reload that core - but this is a clear plan forward.
17:25:53
drmeister
The Ravenbrook folks do have a way of GC'ing JITted code - they developed it for their other commercial client.
17:34:56
drmeister
So - currently the way we build clasp - compile-filing everything and linking it together into a single fasl/executable has problems that hamper our ability to implement inlining.
17:35:31
drmeister
At startup, these fasl/executables evaluate each toplevel form one at a time and build the environment up that way. It's very modular - but the modularity brings problems.
17:36:25
drmeister
An alternative is to essentially load a full environment into memory and then "dump core".
17:36:51
drmeister
We need GC support for that. We can get it from Ravenbrook. We probably can't get it from Boehm.
17:37:32
Bike
i don't think it's really "modular", the whole problem with inlining is interdependencies
17:38:46
cracauermob
The only quick way here without original research is to switch the GC to from now on collect into one contiguous region
17:39:56
drmeister
Well, the Ravenbrook folks could give us something more sophisticated - we could specify the pools and then they create a serialized version of those pools. At load time the pools could be reconstituted from the serialized version.
17:40:53
drmeister
cracauermob: What you are proposing is straightforward from the point of view of writing the data out to disk.
17:41:36
cracauermob
Unless it is with the sale purpose of adjusting pointer so that you can mmap anywhere
17:42:54
cracauermob
After restart you can then GC again and spilt objects by pools as they move, if that is considered important.
17:43:35
cracauermob
Or that one mapped corefile has annotations to tell it which part we're which region
17:44:25
drmeister
The memory layout in Clasp is pretty rich. Cons cells are in their own pool, objects without internal pointers are in their own pool.
17:45:06
cracauermob
So on that gc-tosave-core you would write regions one by one, also dumping their metadata
17:47:08
drmeister
Yes - let's say we ask them to handle the serialization of their MPS memory structures to a single contiguous block of memory. The gc-tosave-core function would take all of the MPS memory in an Arena and turn it into a contiguous block of memory.
17:47:38
drmeister
Then at startup we mmap that block into memory somewhere and we call another MPS function that recreates the Arena.
17:49:42
cracauermob
I mean the source code is still there, but what about closures that had specific scope?
17:49:44
drmeister
They solved this problem for their other commercial client. They keep track of relocation information in the code and they can move the code around.
17:51:21
Bike
well, a dynamic linker is what a core load is, yeah? sbcl has to do fixups and stuff too
17:52:28
cracauermob
Sbcl's image loading does no linking or fix-up because it maps at the same address
17:52:58
drmeister
jitting happens at an llvm Module level there is a table of roots inside of the module that the code in the module refers to with IP relative addressing. If the whole module is moved around - what in the code needs to be relocated?
17:56:36
drmeister
If Ravenbrook has a way to relocate llvm Modules - then Modules become just another llvm object that is part of the Arena.
18:02:42
drmeister
I don't think of it so much as doing relocations when we load a core - more like convert some blob of bytes into a working Arena. Relocations will need to be taken care of by the serialization/deserialization.
18:05:11
drmeister
cracauer: If we make the implementation as dumb as possible - we need to throw out the garbage collector.
18:05:52
drmeister
The point here is we have several experts on the garbage collector available and waiting for instructions on what we need.
18:06:16
cracauer
The way I would do it with that I know is that I start a full-heap GC into the area I want to save.
18:07:28
drmeister
cracauer: That sounds like the serialization process that I'm envisioning asking the Ravenbrook folks to figure out how much work it would require to implement.
18:08:06
cracauer
But the heap data is not changed in any way by the image saving/loading processes.
18:10:18
cracauer
Sorry I missed that SBCL implemented varying the base address. But that is a minor detail.
18:11:49
cracauer
I am very afraid of new code doing pointer adjustments. Debugging would be a nightmare.
18:12:47
drmeister
We need to incorporate whatever solution they developed for GC'ing llvm JITted code and then ask them to give us an estimate for what it would take to serialize an Arena to a single contiguous block of memory and then reconstruct that Arena from that contiguous block of memory.
18:15:30
cracauer
I suppose you could, instead of doing a final GC into one contiguous area at one address, just dump things where they are and re-map them at the same addresses. But that means lots of opportunities of clashes.
18:15:58
cracauer
You can do that final GC into one VM region with existing GC code. So it would be safe and likely to work.
18:17:12
cracauer
stassats: I suppose the best way is to write the mechanism in a way that support relocation, but happen to map at the same address at first for safety.
18:17:27
drmeister
The Ravenbrook folks want me to write this up in an email - I chaff at that because I don't really know what I'm asking for - or what is possible. I prefer the back and forth and immediacy of chats like this one.
18:17:32
cracauer
If you save regions individually you have more opportunity for VM mapping clashes.
18:18:34
drmeister
cracauer: It's hard to be specific because I don't know much about the underlying machinery in mps - I don't know what is hard and what is easy for them.
18:20:27
cracauer
Yeah, I just don't know about that. They have a GC spec, but that doesn't mention moving code, jit or otherwise.
18:22:44
cracauer
In fact, nothing forces us to pick final-gc versus original-address for any region.
18:23:21
drmeister
The Ravenbrook folks came up with another way. From the brief description they gave me they use a large code model and they fix up pointers.
18:24:00
cracauer
For any region, we can decide freely whether to re-map it at the original location or have it compacted into one VM region.
18:24:54
cracauer
Their spec says clearly that code might keep copies of pointers around that you don't know about.
18:27:03
cracauer
Whatever ravenbrook does, or in fact we do right now, might only work because of not-jet-implemented optimizations in LLVM.
18:31:41
drmeister
We are puzzling over what we need from the MPS to solve a host of problems that clasp has in the way it starts up.
18:32:24
drmeister
One idea is to essentially "dump core" of clasp and then load that image at startup and proceed.
18:32:40
nickbarnes
so we collect a list of offsets to the references which are embedded in the code, and store it alongside the code, so we can scan the code.
18:33:17
drmeister
Say if we could serialize an entire MPS Arena into a contiguous block of data. We would need to also include the JITted code in the Arena - wouldn't we? That means we need to GC JITted code.
18:34:00
nickbarnes
All code in Configura system is jitted, either by LLVM or by their pre-existing back-end.
18:34:56
drmeister
Our JITted code is in llvm Module's. The code within the Module refers to data within the Module using RIP relative addressing - is that a problem?
18:35:05
cracauer
Doesn't LLVM jit code hold pointers to other code that are hardcoded, not expected the target to move?
18:36:01
drmeister
Yes - and right now might not be the best time to ask you these questions nickbarnes
18:36:25
nickbarnes
Some time when I have more time I can show you our whole system for dealing with this.
18:36:34
drmeister
Perhaps we should plan a time to chat about these questions - so that we can draft a more formal request to Ravenbrook.
18:37:18
nickbarnes
there's some LLVM code-generation options which we had to tinker with a bit, but less than it could have been.
18:37:43
nickbarnes
RIP-relative is fine as long as you keep all that code together in the same code object on the heap.
18:39:52
nickbarnes
(although the Module may contain several LLVM functions, mainly to support the exception semantics of the Configura language).
18:40:09
drmeister
nickbarnes: Would you have time later today to chat - or I can send out a doodle poll (multi time-zone Ugh) Or we wait until I get back to Philadelphia after the 20th.
18:41:07
drmeister
At present Clasp also JIT's each top level function as a separate Module. Now - each top level function generates multiple llvm functions - that may complicate things - or they are all just internal pointers to the Module.
18:43:39
drmeister
nickbarnes: David and you seemed interested to get something from me quickly. I'm juggling a bunch of things here (family, work, coding, timezones). I also really need some back and forth discussion to sort out what exactly we would be asking for. So a freeform chat would be very, very helpful to me.