libera/#clasp - IRC Chatlog
Search
13:04:14
Colleen
Bike: drmeister said 7 hours, 49 minutes ago: I got clasp to look in the /opt/clasp/lib directory using -rpath
13:23:51
drmeister
I'll check and see if this libunwind works on macOS and if it does then we can just use that.
13:25:10
drmeister
Anywho - I build llvm libunwind in the deploy script and we link it on linux and I set the path to it using -rpath. This all works on linux.
13:25:32
drmeister
Lang is talking to folks at Apple that develop libunwind to fix the problem that we saw with it.
13:27:18
drmeister
Can you change the backtrace code to use libunwind to get the RBP for frames where it is available? Get rid of the frame pointer dereferencing.
13:28:34
drmeister
We are probably the only compiled dynamic language that supports that. I looked at the Julia code - they don't appear to do the work to get arguments.
13:30:26
drmeister
Also, I was reading the llvm.experimental.patchpoint documentation again. It seems to hint at beach's call site approach.
13:30:30
drmeister
"The llvm.experimental.patchpoint intrinsic also lowers a specified number of arguments according to its calling convention. This allows patched code to make in-place function calls without marshaling."
13:32:16
drmeister
"A special calling convention has been introduced for use with stack maps, anyregcc, which forces the arguments to be loaded into registers but allows those register to be dynamically allocated. These argument registers will have their register locations recorded in the stack map in addition to the remaining live values."
13:34:52
drmeister
So let's get the backtrace and debugging stuff nailed down and document a few more things and do a release.
13:40:07
drmeister
Yes - for now - until the libunwind folks and Lang come up with a better solution. They are working on it.
13:42:45
drmeister
There was a different problem with gnu libunwind and llvm libunwind and JITted code on linux. With llvm libunwind it was that the JITted code is hard wired to expect gnu libunwind.
13:43:32
drmeister
With gnu libunwind there is a bug. gnu libunwind takes over unwinding when you link with it and it's busted.
13:45:23
drmeister
Yeah - it was when we first throw an exception from JITted code - it skips the catch in the JITted code.
13:46:07
drmeister
I can bring it up under udb and I can see the different behavior - but it would take some more work to figure out what the problem is.
13:46:51
drmeister
I'd like to have debug versions of gnu libunwind, libgcc_s and whatever defines __gxx_personality_v0 to debug it.
13:48:17
drmeister
You can see differences in the search phase. The gnu libgcc_s calls __gxx_personality_v0 only 4 or 5 times and then jumps to the catch. gnu libunwind calls __gxx_personality_v0 many, many times and then ends up in the wrong catch.
13:50:45
drmeister
Lang will be on in a few hours - I'll ask about the libunwind fix timing. Or if there is any other way around this.
13:53:19
drmeister
I have the source code for gnu libunwind and __Unwind_RaiseException and I can see the call to __gxx_personality_v0 and the arguments and the return values.
13:54:16
drmeister
I can do the same thing with libgcc_s - which works - but again, staring at disassembled code.
13:55:40
drmeister
That would be convenient - maybe it is. I can ask cracauer to keep working on installing a debug version of libgcc_s and gnu libunwind. We can LD_PRELOAD to use those libraries and debug with symbols.
13:55:55
Bike
https://github.com/gcc-mirror/gcc/blob/master/libstdc++-v3/libsupc++/eh_personality.cc#L347-L731
13:59:47
drmeister
Is there another way around this? We want the CFA or RBP for each of our frames. If libgcc_s could give this to us that would do it.
14:00:27
Bike
i don't think there is any way to get the general CFA short of a full blown dwarf interpreter
14:09:51
drmeister
I can totally believe that there is a stripped down dwarf interpreter in libgcc_s but they didn't add a function call that lets us use it to get CFA's
14:10:38
drmeister
But there is that thing where you can call a callback for each stack frame. Maybe that can give us what we want?
14:11:05
Bike
i'm pretty sure the callback just receives the context as the argument representing the stack frame
14:11:19
drmeister
I'm not saying it can - I don't see anything that leads me to believe that. But it would be super convenient if it did. We wouldn't need libunwind.
14:11:52
Bike
libunwind does have a partial dwarf interpreter, which lives here https://github.com/libunwind/libunwind/blob/master/src/dwarf/Gexpr.c
14:14:42
drmeister
Also, what's the deal with macOS vs ELF unwinding - there appear to be differences. That's the elephant in the room that we haven't talked about.
14:15:40
drmeister
That's the unspoken reason why I think we dont want to incorporate our own DWARF interpreter - because we'd probably need one for ELF and some other solution (libunwind probably) on macOS.
14:16:02
Bike
https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2.c#L522-L942 here's libgcc's dwarf interpreter, i think
14:16:18
Bike
well, the dwarf format is probably the same between elf and macho or whatever, the problem would just be finding the dwarf in the first place
14:18:08
drmeister
I'll ask Lang when he comes on. It's a good time to learn this stuff because I feel like I'm starting to get it.
14:18:53
drmeister
If unwinding is still slow for us there is that really tantalizing prospect of compiling the DWARF to native code. Crazy.
14:19:38
drmeister
But I'd like to (1) release (2) incorporate MMtk (3) get call site optimization implemented.
14:20:47
Bike
how much did you say it sped things up by? because i would think it would still be relatively slow given we'd still have to read a file or something
14:21:27
drmeister
I'm not sure about the reading file thing. I think they embedded the code in the ELF file and they had a modified libunwind that used it.
14:22:19
drmeister
Look - it's crazy now - but compilation is an option for fast unwinding an a unix/C++ world. I thought that was always going to be a deficiency.
14:23:12
drmeister
Also, I think I see a way to use the parallel JIT facility to speed up linking at startup time and loading fasls.
14:24:50
Bike
well i mean like, wasn't the big slowdown from the lock in dl_iterate_phdr or whatever
14:25:20
drmeister
This sequence analysis thing I wrote screams - it's basically bound by the C++ library now that I removed more of the consing in the loops. We've got a nice way of getting compiler performance with dynamic programming and use C++ libraries.
14:26:14
drmeister
That's big slowdown is when you unwind in multiple threads - yes. By the time we are compiling DWARF - I bet we can get around using a lock.
14:26:44
Bike
how? whether it's dwarf VM or machine code, we have to get it in the first place, right?
14:34:08
drmeister
I don't know enough about it at this time to discuss it intelligently. I understand the DWARF interpreter thing a little bit more and I see how to get rid of that overhead.
14:34:51
drmeister
And of course - if dwarf interpretation isn't the rate limiting step - then we wouldn't optimize that first.
14:35:28
drmeister
We should figure out if we can recognize the dwarf interpretation in the profiling data.
14:36:13
Bike
also, was that 25x speedup with frame pointer omission? because if RBP is valid, the dwarf expression will just be a simple command to read that register
14:40:44
drmeister
I don't know what the 25x speedup is from - they said they compiled the eh_frame dwarf bytecode to C and then assembly. It's in that paper and in that talk I posted.
14:46:16
drmeister
I thought unwinding was always going to be slow - because nobody cares about C++ exception handling.
14:48:34
drmeister
Anyway - let's get this libunwind thing sorted out. It's an embarrassment (and not ours) that this doesn't work properly.
15:22:40
Bike
so the problem with telling dwarf about the parameters/register save area is that in bclasp, source position info is in a dynamic variable, while for cclasp, it's a property of the intermediate representation
15:22:52
Bike
and the llvm.dbg calls are generated in a place that i think is not amenable to passing information in
15:23:48
Bike
unlike with other source position info stuff, we can't just generate with abandon and let dbg attachments be generated or not depending on the initialization
15:25:30
drmeister
There's a problem building on linux - we still need to use LD_LIBRARY_PATH =/opt/clasp/lib. Gotta figure out how to get rid of this.
15:30:25
Bike
i know i already said this, but i have some serious concerns about adding more hoops for the build process
15:34:55
drmeister
Understood - libunwind is another dependency - but how else do we get crash proof backtraces with arguments?
15:35:56
Bike
well, ideally i suppose we'd get gnu libunwind fixed, so that we don't need to go out of our way to link it
15:56:28
Bike
i think i can generate dwarf info for the parameters without things crashing... might wanna test it a little more thoroughly though
15:56:38
drmeister
I think we should barrel forward with llvm libunwind to test backtrace code and make sure all that stuff works.
15:57:18
drmeister
Meanwhile I'll try and figure out what's going on with gnu libunwind on linux and if we can fix it or if it's an llvm problem and we can get that fixed. Then we can drop the dependency for llvm libunwind on linux.
15:57:46
Bike
I mean I don't mind doing this temporarily or anything, I just think if we release and are like "by the way, on linux you also need to build llvm with this patch" it won't go over well
15:58:20
drmeister
Ok, if you give me the register-save-area DWARF stuff then I can improve the python debugger extension to get backtraces with argument in lldb and gdb/udb.
15:58:49
drmeister
That will be soooo helpful for low level problems like the ones we are going to have later when we incorporate MMtk.
15:59:17
drmeister
Having an external debugger that can introspect clasp is extremely useful. I wish I'd put more time into it earlier.
16:00:12
drmeister
Right - absolutely - patching llvm for libunwind is a block currently to release. I posted on Discord#jit to Lang about it about an hour ago.
16:01:22
drmeister
I want a performant, dynamic language that interoperates with C++. That has not changed as a goal and nothing has appeared to make this irrelevant in the years that we've been working on this.
16:01:47
drmeister
Bulletproof backtraces with arguments are very helpful for debugging. I want that in there.
16:02:00
Bike
i pushed the thing to put parameters in the dwarf. register save area might be a little dicier
16:02:37
drmeister
What is it that makes the register save area dicier? How did you solve the !dbg metadata issue?
16:03:04
Bike
well like i mentioned, at the point where we generate dbg calls now, we don't have source info passed in
16:03:53
Bike
the parameter code is already set up to use *dbg-current-function-lineno*, which i think should be ok, but i might have to do some rewriting to make things work similarly with the RSA
16:03:57
drmeister
The buildbot fails when starting iclasp-boehmprecise but it doesn't say why - I'm guessing it fails because it's not finding libunwind because LD_LIBRARY_PATH isn't set. I'm going to link libunwind statically into clasp on linux.
16:04:44
drmeister
I see - source code info is not available - what you were saying is starting to sink in now.
16:05:26
Bike
it might be best to just move the generation somewhere else. if this doesn't work i'll try that
16:05:27
drmeister
*current-source-pos-info* and *dbg-current-function-lineno* - are they redundant or overlapping functionality?
16:06:20
Bike
current-source-pos-info is set by the reader, so it will be wherever a top level form starts, rather than the particular definition
16:06:36
Bike
dbg-current-function-lineno is bound by with-dbg-function, so cclasp can bind it more precisely
16:06:47
drmeister
My guess is that the register-save-area won't be optimized away because it's 6 words and it has a stackmap entry - it should stay firmly in the stack frame.
16:08:02
drmeister
Well - it would be good to have this work in aclasp+bclasp+cclasp because I tend to do a lot of low level debugging in the early stages of startup.
16:08:49
drmeister
Maybe we can spoof things so that we stick a register_save_area in the interpreted entry point and then we can get arguments for that as well.
16:11:53
Bike
totally unrelated, but about the scraper - as part of stamp generation, the analyzer refers to gctools:get-stamp-name-map
16:12:44
Bike
and dumping it seems like it would make the stamp generator be begging the question, but i don't really understand it anyway
16:48:29
drmeister
I think if gctools:get-stamp-name-map is used it's for a small set of stamps that we hard coded at one time but I don't think we do that anymore.
17:04:19
drmeister
Bike: I think I decided to reuse the stamps that the scraper generated and then build on top of it.
17:08:22
drmeister
So the scraper is generating stamps for the classes that it knows. Then the static analyzer is taking those stamps and assigning additional ones for the new classes that it knows about.
17:10:02
drmeister
So this isn't a problem. We can remove from the static analyzer. The scraper will generate all the stamps at build time going forward.
17:10:40
drmeister
Dang it - waf keeps adding libunwind as a dynamic library despite my best efforts so far.
17:30:32
Bike
since the analyzer generates more stamps, all that code needs to make its way into the scraper
18:33:22
drmeister
Right. I changed how stamps were generated several times until I settled on what we do now. For a while there were some stamps that were hard coded. I don't think that's the case anymore. Checking...
18:35:28
drmeister
No - I don't see anything in the scraper about hard coded stamps. I'd expect to find a small table containing General_O but I don't.
19:25:14
drmeister
It does generate stamps for things like General_O - we probably don't need to. But it simplifies the code if the IsA test for General_O checks the range covering STAMPWTAG_core__General_O... STAMPWTAG_xxx - maybe?
19:26:30
drmeister
I did a depth first walk of the class tree and assign stamps as I go. So everything gets a stamp - even things like General_O and Number_O although you never see those in the header of any object.
19:27:09
drmeister
There are plenty of stamps available - so I didn't see a reason to make the code more complicated.
19:27:53
drmeister
If you wanted to make the code more complicated you would walk the class hierarchy the same way but only assign stamps for objects that really show up in memory.
19:29:01
drmeister
Hmm, there is a reason to not allocate stamps for these classes that don't get allocated - it would keep the vector of fixable pointer masks smaller and more cache friendly during GC.
19:31:04
drmeister
This is a good time to switch to this approach of only assigning stamps for concrete classes that get allocated. Then when you build the IsA tests you would use stamp ranges that start on the first concrete class stamp and end on the last concrete class stamp of all the subclasses of the class you are creating the IsA test for.
19:33:06
drmeister
Do it in two passes. First pass walks the hierarchy and assigns stamps only to allocatable classes. Second pass walks the hierarchy and at every class you search for the min/max stamp under it.
19:37:02
Bike
sounds like we need to build up a class hierarchy in the first place. i guess the sif files do list parent classes
20:06:55
drmeister
There is a way to tell in the static analyzer - it may not yet be in the sif files.
20:11:53
drmeister
I'm trying to recall how the static analyzer finds classes that aren't allocated.
20:12:46
drmeister
The problem was always - there's a million classes in any C++ AST because of all the transitive includes - how do we find the ones we are exposing.
20:17:18
drmeister
I think in the 'project' class there's a 'classes' slot - that's EVERYTHING, every class.
20:17:42
drmeister
Then you have lispallocs, classallocs, rootclassallocs, containerallocs. Those are all the different kinds of allocated classes.
20:18:08
drmeister
So maybe we want to write out a sif tag that says if the class is in one of those hash tables.
20:19:55
drmeister
The static analyzer cares because it's looking for specific template classes involved in allocating those different kinds of objects.
20:20:44
drmeister
I'm not sure the entire taxonomy is useful. I'm pretty sure lispallocs and containerallocs are useful but not rootclassallocs and classallocs.
20:21:32
drmeister
So we could create a tag: allocation-taxonomy and then take a look at the output and maybe it will tell us we don't use one or two of these and we could remove them.
20:22:40
drmeister
There are different template classes responsible for allocating these different kinds of things.
20:23:09
drmeister
https://github.com/clasp-developers/clasp/blob/main/src/lisp/modules/clasp-analyzer/clasp-analyzer.lisp#L1834
20:23:46
drmeister
https://github.com/clasp-developers/clasp/blob/main/include/clasp/gctools/gcalloc.h#L864
20:24:12
drmeister
https://github.com/clasp-developers/clasp/blob/main/src/lisp/modules/clasp-analyzer/clasp-analyzer.lisp#L1991
20:25:55
drmeister
Those aren't allocated the same way. They are just instantiated using say: Vec0<Cons_O> ... _KeysValues;
20:27:29
drmeister
Ok, so let me restate that. The static analyzer looks for every class whose instances absolutely needs a stamp.
20:28:05
drmeister
It also gathers up info on every class. I think it then uses the classes that need stamps to find their parent classes to build a complete hierarchy. Currently everything in the hierarchy gets a stamp.
20:28:24
drmeister
But I think we are set up to only assign stamps to the classes that absolutely need them.
20:28:57
drmeister
So in a way, limiting stamps to the classes that absolutely need them kind of fulfills the purpose of the static analyzer.
20:29:25
drmeister
We just need to get the info into the sif file so that the scraper can know what classes need stamps.
20:31:59
drmeister
Now, if we don't use the static analyzer - then we only have the info provided by scraping the C++ code and we don't know exactly what classes will be allocated - so everything gets a stamp in build_cboehm.
20:33:12
drmeister
It may not be much of an optimization. How many classes do we have that we allocate and how many do we have that we don't. It may be that 95% of classes need stamps.
20:47:17
Bike
no, i just mean, i don't understand the details of it and how it affects the stamp generation
20:47:22
drmeister
The idea of the WTAG is you get those two bits and you know where to get the stamp for dispatch.
20:47:53
drmeister
You generate stamps and then shift the stamp two bits to the left and then stick in the WTAG.
20:49:40
drmeister
When you do the IsA tests on ranges you deal with the WTAG by taking the min_stamp and max_stamp and testing the range (min_stamp<<2|#b00) ... (max_stamp<<2|#b11)
20:52:52
drmeister
Because of this every stamp|ww|mm value looks like a FIXNUM in Common Lisp with the integer value stamp|ww
20:54:47
Bike
okay, so what else do we need to worry about? the root? we wanted c++ classes to show up before lisp classes in the numbering, right?
20:55:13
drmeister
Now - we've discussed getting rid of the WTAG - because we could just use the header stamp values to figure out where the dispatch stamp value lives. But WTAGs are pretty deeply ingrained now and I don't think they hurt us too much and they might actually be useful. They save us several stamp comparisons at runtime vs one WTAG comparison.
20:56:20
drmeister
Just start numbering stamps at 1 and they can go up to 65536 (I think) or 16384 - one of those. You won't hit that limit.
20:57:22
drmeister
Currently the stamps go 1...65536 (C++ classes) 65537...128K (clbind classes) 128K+... Lisp classes.
21:00:14
drmeister
Assume there is a TAGS:ALLOCATED . 1 and you will get our current behavior. By the end of today I should be able to give you sif files with TAGS:ALLOCATED . 0 for things like Number_O and General_O
21:03:05
drmeister
There's no way I could have planned this all out from the start. There's way too much experience here.
23:11:19
drmeister
Bike: It looks like I was wrong about the allocators - everything is showing up as allocated. I must have some default allocators in there. Checking...
23:12:16
drmeister
Yeah : https://github.com/clasp-developers/clasp/blob/main/include/clasp/core/object.h#L366
23:13:46
drmeister
That applies to the lispallocs category - every class under General_O will be marked as allocated.