freenode/#clasp - IRC Chatlog

12:42:56 Nephromancer hey y'all

12:45:05 Bike good morning.

12:45:25 beach Hello Nephromancer. Hello Bike.

15:43:54 Bike i have ext:source-location returning what it ought to, but if i try M-. i get "Containing expression ends prematurely"...

15:44:52 Bike hrm. the offset is off by like two

15:44:54 Bike that's weird

16:17:36 Bike drmeister: doing peek-char seems to update the result of core:input-stream-source-pos-info. I am confused.

17:16:00 drmeister Hello

17:16:52 drmeister Continuing the conversation from yesterday (because I need to tell the Ravenbrook people something).

17:17:25 drmeister I can't figure out how we use core dumping to build Clasp. We don't GC code - so we can't dump the core.

17:18:42 drmeister What exactly would we be dumping and when would we be dumping it?

17:20:51 Bike the jit stuff lets you control what memory functions are put into

17:22:50 drmeister So we JIT code into particular memory locations and then dump that along with the memory? For cclasp - we dump the core at the end of loading inline.lisp rather than compile-filing everything.

17:23:40 Bike at the end, yeah.

17:24:08 drmeister We have C++ memory that we would need to fix up when we reload that core - but this is a clear plan forward.

17:24:47 drmeister It would certainly speed things up.

17:25:00 drmeister Because it would eliminate compile-file'ing everything.

17:25:53 drmeister The Ravenbrook folks do have a way of GC'ing JITted code - they developed it for their other commercial client.

17:26:40 drmeister If we used that - then code becomes like other data.

17:27:20 Bike well, we want to do that in the future too.

17:31:50 drmeister cracauer is coming - I'll wait 'till he gets here.

17:33:18 cracauermob Hello

17:33:34 drmeister Hello

17:34:56 drmeister So - currently the way we build clasp - compile-filing everything and linking it together into a single fasl/executable has problems that hamper our ability to implement inlining.

17:35:31 drmeister At startup, these fasl/executables evaluate each toplevel form one at a time and build the environment up that way. It's very modular - but the modularity brings problems.

17:35:53 drmeister So I've been casting about for a way to deal with the problems.

17:36:25 drmeister An alternative is to essentially load a full environment into memory and then "dump core".

17:36:51 drmeister We need GC support for that. We can get it from Ravenbrook. We probably can't get it from Boehm.

17:37:32 Bike i don't think it's really "modular", the whole problem with inlining is interdependencies

17:37:38 drmeister That's not a show stopper - we could eliminate Boehm or deal with that later.

17:38:32 drmeister I think it is modular - and the modularity gives us problems.

17:38:46 cracauermob The only quick way here without original research is to switch the GC to from now on collect into one contiguous region

17:39:07 cracauermob Then write that region as an image

17:39:37 cracauermob At first requiring that it be mapped at the same virtual address

17:39:48 cracauermob When restarted

17:39:56 drmeister Well, the Ravenbrook folks could give us something more sophisticated - we could specify the pools and then they create a serialized version of those pools. At load time the pools could be reconstituted from the serialized version.

17:40:33 Bike for mps the "contiguous region" would be the "arena"

17:40:46 Bike we're allowed to give mps an arena, from mmap or whatever, if i recall correctly

17:40:48 cracauermob I don't think that serialization would be easy or quick

17:40:53 drmeister cracauermob: What you are proposing is straightforward from the point of view of writing the data out to disk.

17:41:30 drmeister But we loose all of the special pools and the way that MPS works.

17:41:36 cracauermob Unless it is with the sale purpose of adjusting pointer so that you can mmap anywhere

17:42:21 drmeister I'm asking about what we ask the MPS people to research.

17:42:54 cracauermob After restart you can then GC again and spilt objects by pools as they move, if that is considered important.

17:43:27 Bike the pools include information about how gc is done

17:43:35 cracauermob Or that one mapped corefile has annotations to tell it which part we're which region

17:43:38 Bike like where pointers are in objects and stuff

17:43:45 Bike so it's probably not optional

17:44:07 cracauermob Right, bike

17:44:25 drmeister The memory layout in Clasp is pretty rich. Cons cells are in their own pool, objects without internal pointers are in their own pool.

17:44:39 drmeister I'd rather not rewrite all of that.

17:45:06 cracauermob So on that gc-tosave-core you would write regions one by one, also dumping their metadata

17:45:59 cracauermob As long as it ends up mmappable

17:46:21 cracauermob At the same location in virtual memory

17:47:08 drmeister Yes - let's say we ask them to handle the serialization of their MPS memory structures to a single contiguous block of memory. The gc-tosave-core function would take all of the MPS memory in an Arena and turn it into a contiguous block of memory.

17:47:38 drmeister Then at startup we mmap that block into memory somewhere and we call another MPS function that recreates the Arena.

17:48:31 drmeister We would also need to ask for their approach to GC'ing JITted code.

17:48:56 cracauermob Un-jit it?

17:49:42 cracauermob I mean the source code is still there, but what about closures that had specific scope?

17:49:44 drmeister They solved this problem for their other commercial client. They keep track of relocation information in the code and they can move the code around.

17:50:38 cracauermob This is reimplementing a lot of a dynamic linker

17:50:47 drmeister I don't know what unjit it could mean.

17:51:21 Bike well, a dynamic linker is what a core load is, yeah? sbcl has to do fixups and stuff too

17:51:34 cracauermob I don't think we'd need it anyway.

17:52:28 cracauermob Sbcl's image loading does no linking or fix-up because it maps at the same address

17:52:37 cracauermob After restart

17:52:58 drmeister jitting happens at an llvm Module level there is a table of roots inside of the module that the code in the module refers to with IP relative addressing. If the whole module is moved around - what in the code needs to be relocated?

17:53:02 cracauermob Only addresses of c code

17:53:20 drmeister Right.

17:54:04 cracauermob I suspect that llvm might put more restrictions on moving code.

17:54:19 Bike i thought sbcl started doing fixups a while ago

17:54:47 cracauermob That they might feel free to copy addresses as values into code.

17:55:11 cracauermob Sbcl wanted to move to a relocatable core

17:55:35 cracauermob I don't think they implemented that?

17:56:34 cracauermob I think hardcoring the core image address is fine on 64 bits

17:56:36 drmeister If Ravenbrook has a way to relocate llvm Modules - then Modules become just another llvm object that is part of the Arena.

17:56:48 cracauermob Hardcode

17:56:54 Bike https://github.com/sbcl/sbcl/blob/master/src/runtime/coreparse.c#L417 hm hm

17:57:40 drmeister And it should be serialized with gc-tosave-core

17:58:24 drmeister Bike: What is that - could you translate?

17:59:49 drmeister If it's really relevant. These conversations have a tendency to get derailed.

18:00:11 Bike i think it's sbcl doing relocations when it loads a core

18:00:20 cracauermob Yes

18:00:34 cracauermob I did notice

18:00:57 cracauermob Does matter much for our discussion

18:01:44 cracauermob Does not

18:01:58 stassats cracauermob: sbcl is relocatable now

18:02:42 drmeister I don't think of it so much as doing relocations when we load a core - more like convert some blob of bytes into a working Arena. Relocations will need to be taken care of by the serialization/deserialization.

18:02:52 cracauer My point is that the first implementation should be as dumb as possible.

18:03:38 cracauer If you do deserialization of any kind you wreck memory sharing of processes.

18:03:47 cracauer And it could be a nightmare to debug.

18:05:11 drmeister cracauer: If we make the implementation as dumb as possible - we need to throw out the garbage collector.

18:05:25 cracauer Well not that dumb :-)

18:05:32 Bike fine dumbness balance

18:05:52 cracauer I think that keeping the regions does not require serialization.

18:05:52 Bike it's not like fixing an address makes preserving gc impossible, i don't think

18:05:52 drmeister The point here is we have several experts on the garbage collector available and waiting for instructions on what we need.

18:06:00 drmeister I'm trying to figure out what we need.

18:06:03 Bike there is a question of how much mps needs to be changed by dumbness level, ofc

18:06:16 cracauer The way I would do it with that I know is that I start a full-heap GC into the area I want to save.

18:06:26 cracauer one region after anyone.

18:06:43 cracauer Then dump the region metadata into that contiguous region, too.

18:07:17 cracauer So one mmaped area, a single one, at one address, contains everything.

18:07:28 drmeister cracauer: That sounds like the serialization process that I'm envisioning asking the Ravenbrook folks to figure out how much work it would require to implement.

18:07:33 cracauer With the pointers all set up for that VM location.

18:07:49 cracauer OK, it is serialization of the metadata.

18:08:06 cracauer But the heap data is not changed in any way by the image saving/loading processes.

18:08:23 cracauer Only on that final GC into one VM area. No new GC code.

18:09:09 cracauer Existing GC code takes care of all pointer adjustments.

18:10:18 cracauer Sorry I missed that SBCL implemented varying the base address. But that is a minor detail.

18:10:54 Bike yeah

18:10:58 drmeister I will ask for that then.

18:11:49 cracauer I am very afraid of new code doing pointer adjustments. Debugging would be a nightmare.

18:11:49 stassats well, you can just avoid doing relocation

18:12:00 stassats disable heap randomization

18:12:11 cracauer stassats: right, and if we want it we do it later, like SBCL did.

18:12:27 stassats on the other hand, it's better to keep relocation in mind

18:12:34 cracauer yes.

18:12:34 stassats easier to do the right thing from the outset

18:12:47 drmeister We need to incorporate whatever solution they developed for GC'ing llvm JITted code and then ask them to give us an estimate for what it would take to serialize an Arena to a single contiguous block of memory and then reconstruct that Arena from that contiguous block of memory.

18:13:09 stassats it would take a memcpy

18:13:59 cracauer stassats: why would it only take a copy?

18:14:18 cracauer drmeister: that is still not very specific.

18:15:30 cracauer I suppose you could, instead of doing a final GC into one contiguous area at one address, just dump things where they are and re-map them at the same addresses. But that means lots of opportunities of clashes.

18:15:58 cracauer You can do that final GC into one VM region with existing GC code. So it would be safe and likely to work.

18:16:42 stassats cracauer: or two copies

18:16:48 stassats or whatever the number of regions

18:17:12 cracauer stassats: I suppose the best way is to write the mechanism in a way that support relocation, but happen to map at the same address at first for safety.

18:17:27 drmeister The Ravenbrook folks want me to write this up in an email - I chaff at that because I don't really know what I'm asking for - or what is possible. I prefer the back and forth and immediacy of chats like this one.

18:17:32 cracauer If you save regions individually you have more opportunity for VM mapping clashes.

18:17:37 stassats relocation is just combing through the heap and updating pointers

18:17:59 cracauer drmeister: did they give you more about the JIT problem?

18:18:08 stassats cracauer: i'd say the chances are the same as one big blob

18:18:19 cracauer I have this nagging feeling that LLVM might not guarantee relocatability.

18:18:34 drmeister cracauer: It's hard to be specific because I don't know much about the underlying machinery in mps - I don't know what is hard and what is easy for them.

18:18:38 stassats well, it can produce shared libraries, can't it?

18:18:51 Bike it can, but we're talking about the jit, which is kind of its own beast

18:20:27 cracauer Yeah, I just don't know about that. They have a GC spec, but that doesn't mention moving code, jit or otherwise.

18:21:29 stassats you can make just the code region pinned down

18:21:39 stassats if that's what it'll take

18:22:40 drmeister The LLVM GC specification is not useful to us. We don't use it at all.

18:22:44 cracauer In fact, nothing forces us to pick final-gc versus original-address for any region.

18:23:21 drmeister The Ravenbrook folks came up with another way. From the brief description they gave me they use a large code model and they fix up pointers.

18:24:00 cracauer For any region, we can decide freely whether to re-map it at the original location or have it compacted into one VM region.

18:24:24 cracauer I don't think you can, strictly speaking, fix up pointers in LLVM code.

18:24:45 drmeister They told me they came up with a way to do it.

18:24:52 drmeister Hang on.

18:24:54 cracauer Their spec says clearly that code might keep copies of pointers around that you don't know about.

18:25:22 cracauer The only way to control that is to implement what they have in the GC notes.

18:27:03 cracauer Whatever ravenbrook does, or in fact we do right now, might only work because of not-jet-implemented optimizations in LLVM.

18:28:38 cracauer brb

18:30:19 nickbarnes Hello there CLASP people.

18:30:39 cracauer hello

18:30:50 drmeister Hi nickbarnes

18:31:10 drmeister nickbarnes: You guys figured out how to GC llvm jitted code - yes?

18:31:17 nickbarnes Yes.

18:31:26 nickbarnes We do it routinely for Configura.

18:31:41 drmeister We are puzzling over what we need from the MPS to solve a host of problems that clasp has in the way it starts up.

18:32:03 nickbarnes We use our own linker

18:32:24 drmeister One idea is to essentially "dump core" of clasp and then load that image at startup and proceed.

18:32:40 nickbarnes so we collect a list of offsets to the references which are embedded in the code, and store it alongside the code, so we can scan the code.

18:33:17 drmeister Say if we could serialize an entire MPS Arena into a contiguous block of data. We would need to also include the JITted code in the Arena - wouldn't we? That means we need to GC JITted code.

18:33:32 nickbarnes yes.

18:34:00 nickbarnes All code in Configura system is jitted, either by LLVM or by their pre-existing back-end.

18:34:21 nickbarnes (which is quicker than LLVM but has lower code quality).

18:34:52 nickbarnes and it all lives in objects in their main AMC pool, moves around, etc.

18:34:56 drmeister Our JITted code is in llvm Module's. The code within the Module refers to data within the Module using RIP relative addressing - is that a problem?

18:35:05 cracauer Doesn't LLVM jit code hold pointers to other code that are hardcoded, not expected the target to move?

18:35:34 nickbarnes These are good questions.

18:35:46 drmeister The Module also contains a table of roots that need to be fixed.

18:36:01 drmeister Yes - and right now might not be the best time to ask you these questions nickbarnes

18:36:25 nickbarnes Some time when I have more time I can show you our whole system for dealing with this.

18:36:34 drmeister Perhaps we should plan a time to chat about these questions - so that we can draft a more formal request to Ravenbrook.

18:36:53 nickbarnes Basically the LLVM module has all the linking info

18:37:18 nickbarnes there's some LLVM code-generation options which we had to tinker with a bit, but less than it could have been.

18:37:43 nickbarnes RIP-relative is fine as long as you keep all that code together in the same code object on the heap.

18:38:23 drmeister I am certain that we will be moving entire Module's as single MPS objects.

18:39:01 nickbarnes at present we JIT each Configura function (or method) as a separate Module.

18:39:52 nickbarnes (although the Module may contain several LLVM functions, mainly to support the exception semantics of the Configura language).

18:40:09 drmeister nickbarnes: Would you have time later today to chat - or I can send out a doodle poll (multi time-zone Ugh) Or we wait until I get back to Philadelphia after the 20th.

18:41:07 drmeister At present Clasp also JIT's each top level function as a separate Module. Now - each top level function generates multiple llvm functions - that may complicate things - or they are all just internal pointers to the Module.

18:43:26 cracauer I still suspect Clasp's MPS mode might only work by accident right now.

18:43:39 drmeister nickbarnes: David and you seemed interested to get something from me quickly. I'm juggling a bunch of things here (family, work, coding, timezones). I also really need some back and forth discussion to sort out what exactly we would be asking for. So a freeform chat would be very, very helpful to me.

18:43:43 cracauer That there just doesn't happen to be pointer aliasing in the LLVm code.

18:44:17 drmeister cracauer: You have put your finger on my deepest, darkest fear.

18:44:20 nickbarnes Not really today, I'm afraid.

18:44:48 drmeister nickbarnes: Ok, could a more formal request wait until after July 20th?

18:44:58 nickbarnes Yes, it can wait a bit.

18:45:24 nickbarnes We are pretty busy working on MPS visualisation tools at the moment.

18:45:25 drmeister Ok, I appreciate that.

18:45:46 drmeister Excellent - and they sound very exciting and useful.

18:46:00 nickbarnes (and also I'm away this Thu and Fri, and have my daughter knocking about until 6th August so it's nice to have a bit more free time).

18:47:07 cracauer It is good to know LLVm jit modules control all pointers centrally.

18:47:08 drmeister Good - we will discuss what we can here off and on as we do. If you have time - drop in and maybe we can push the thinking forward a bit.

18:47:17 cracauer At least there won't be aliasing problems there.

18:49:18 drmeister cracauer: We haven't tested MPS in production for long periods (we haven't tested anything in production for that matter).

18:50:35 cracauer Myself I only run boehm, which doesn't have the problem.

18:50:56 cracauer I mean the LLVM GC spec isn't too difficult to implement.

18:51:14 cracauer It just has a bunch of annoying restrictions forcing some design decisions.

18:57:29 cracauer In the end, the image-saving could just be done entirely primitive: save and restore each mps region to the same vm address. Keep the region metadata around.

18:57:51 cracauer drmeister: how do you tell mps about where jited code lives?

18:58:10 drmeister I don't tell mps anything about the jited code at the moment.

18:58:19 cracauer ah, right.

18:58:25 drmeister Oh wait - that isn't correct

18:58:26 cracauer No code collection.

18:59:08 drmeister There is one table of roots in every Module. I register those roots by telling the mps where that table starts and how long it is.

18:59:35 cracauer Right, for global variables in "object" files.

19:02:09 drmeister Every MPS object is a block of data that contains from zero to some number of MPS fixable pointers at fixed offsets from the start of the block of data. If we relocate Modules then this table of roots could be treated as a vector of fixable pointers.

19:03:36 drmeister The code in the module refers to the table entries using RIP-relative addressing - so moving the entire Module will preserve that.

19:04:20 drmeister It's not clear to me what other pointers int he Module need to be updated.

19:04:33 drmeister ... pointers in the Module ...

19:06:26 cracauer I need to hunt down some lunch.

19:06:51 Bike what is source-debug-use-lineno-p? it'sin function descriptions buti'm not sure i see anything actually using it

19:12:08 Bike guess i'll try deleting it

19:13:19 nickbarnes Note: support for zero-cost exception mechanisms is tricky.

19:13:26 nickbarnes (especially on Windows).

19:16:55 cracauer why more difficult on Windows

19:17:00 cracauer ?

19:20:31 Bike they have totally different mechanisms... the itanium way seemed harder to arrange, though

19:24:55 nickbarnes The OS needs to figure out what the stack frame layout is, for any given RIP, so it can unwind. On Windows you register a helper function to tell it this information. Unfortunately, everything in the data structure and API for that helper function is 32 bits, so …. For starters, you have to register it separately for each 4GB of heap.

19:25:01 drmeister Bike: The source-debug-XXX is something I added for slime and C-c C-c so that the temporary file that is generated can spoof the entire source file. I'm a little unsure of how it all works - but it does appear to work - so if I could leave it alone right now - that would be appreciated.

19:26:34 Bike swank doesn't use this one.

19:28:23 drmeister Right - I think I had an idea that I needed to support the offset being a line number or an absolute character offset - that's what I think source-debug-use-lineno-p controls. I think it currently only uses one or the other - I don't recall which and I don't recall if source-debug-use-lineno-p actually does anything.

19:28:47 Bike it doesn't, i'm pretty sure. i'm trying itnow.

19:28:57 Bike which "the offset" do you mean?

19:28:59 drmeister The offset being the start of the snippet compiled with C-c C-c relative to the larger source file.

19:29:06 Bike ah.

19:29:26 Bike i'm not sure i understand how that could possibly be part of the description of a function.

19:29:27 Bike eh.

19:29:34 Bike i just want to cut down on things that work but we don't know why they work.

19:30:07 drmeister I don't recall right now either. I have have only partially understood what was necessary to implement C-c C-c properly - but this worked and I stopped once I had something working.

19:30:28 drmeister I MAY have only partially understood ...

19:30:57 drmeister It's only used for C-c C-c If we don't need the source-debug-xxx stuff - I can remove it.

19:31:53 drmeister When I was implementing FunctionDescriptions I thought "I don't recall how this all worked - should I strip it out? .... No - I had a good reason for doing it back then - and I don't want to break slime right now - so I'll just carry the info into FunctionDescriptions and deal with it later if we need to".

19:32:04 Bike i'll try to figure it out.

19:32:09 drmeister I got a whole two weeks out of that before you started asking me questions about it (sigh)

19:33:33 Bike there's some weird stuff going on with the source pathnames that i'm looking at for the source tracking interface. particularly, if you use COMPILE, the function's source pathname is the string "repl-code", and various things treat that like a pathname

19:33:39 Bike i can probably work it out without pestering you too much, though

19:44:07 drmeister Bike: I was messing around with FUNCALL in bclasp. I noticed that it was really slow because it's going through the C++ machinery that implements FUNCALL.

19:44:35 Bike like, it calls the FUNCALL function? yeah

19:45:02 drmeister I changed the code generator so that (funcall 'foo ...) compiles as (foo ...) and (funcall <something> ...) as (<evaluate something> ...)

19:45:47 drmeister Meaning I generate code to evaluate the first argument and then call the result with the remaining arguments. That's all I need to do - right?

19:45:50 Bike are you sure that works if it's (funcall foo ...) and foo evaluates to a symbol?

19:46:08 drmeister No - that won't work then.

19:48:05 drmeister Thank you. So at runtime I need to check the result returned by the code that evaluates the function and what are the possibilities? (1) It could be a closure - in which event I can just call it. (2) it can be a symbol - in which case I call the symbol-function. Anything else?

19:48:28 Bike no, that's it.

19:48:37 Bike you could just look at the compiler macro cclasp uses

19:49:07 Bike https://github.com/clasp-developers/clasp/blob/dev/src/lisp/kernel/cleavir/inline.lisp#L802-L811

19:49:33 Bike i suppose the type error is something else

19:55:01 Bike deleting the lineno thing didn't break the build, and slime still works, and C-c C-c still works, and the source location is hooked up. so i guess it'sfine.

19:57:51 Bike and i see, the source debug pathname business sets it up so that slime's temporary file is like the actual file for source pos reasons. right right right. that makes sense. maybe we can avoid storing it in the function though.

19:57:52 drmeister I've been writing some code to time microbenchmarks to see if I can spot things that are really slow.

19:58:51 drmeister It figures out how many operations can be run per second and summarizes them in terms of the log base 10 of the number of operations that can be run per second.

19:59:05 drmeister The code needs to use funcall and funcall turned out to be a bottle neck in bclasp.

20:00:53 drmeister I recalled that you wrote a compiler macro for cclasp - but that we can't use it because of ... startup/circularity/bootstrapping issues.

20:01:23 drmeister So I thought since I control the bclasp compiler - I can implement the optimization in the bclasp compiler directly.

20:05:26 Bike well, i just meant you'd be doing the same thing

20:05:44 Bike cleavir-primop:funcall is what you referred to as the "(<evaluate something> ...)" case

20:08:46 drmeister Understood.

20:13:04 Bike i see. we write in the *actual* source pathname for functions, so for slime, the tempfiles

20:29:50 Bike good that i learn this now, because it means i set up source locations for classes wrong