libera/#clasp - IRC Chatlog
Search
12:47:10
Colleen
Bike: drmeister said 7 hours, 5 minutes ago: How did you get it to build? I can't get it to build.
13:00:53
drmeister
I can build the interpreter and start compiling code but I get a weird runtime error that other people report.
13:01:33
drmeister
If we got this working we could build libunwind statically and link it statically - I think
13:02:13
Bike
i've been wondering if we can't just skip some of this and use the unw_ functions in whatever libunwind libgcc uses, if they exist
13:02:59
Bike
it's still a hassle. the build process is probably the worst part of clasp right now and i don't want to make it even worse
13:03:34
drmeister
It's not going to be worse. It's a few configuration switches. Let's just see if we can get this working and then decide.
13:04:07
drmeister
From the discord discussion there appears to be a bug in mixing clang with gcc libunwind - do you agree?
13:04:58
drmeister
Going all the way with llvm libunwind works on the mac and it almost works on linux except for this new problem.
13:05:45
drmeister
If the new problem is as bad as the bug with gcc libunwind/clang then we might be effed with clang and libunwind on linux.
13:37:29
drmeister
https://github.com/llvm/llvm-project/blob//llvm/lib/ExecutionEngine/RuntimeDyld/RTDyldMemoryManager.cpp#L70
13:38:15
drmeister
Am I reading this correctly? llvm is compiled to use one libunwind on APPLE and another on linux.
13:42:27
drmeister
I can apply the patch to llvm and use the option and then see if that fixes the issue. That would be a PITA in the deploy script and we'd have to get a better solution in place for llvm13
13:50:26
drmeister
I made the changes to deploy to hack llvm to do this. If this works then we have options.
13:51:34
drmeister
It sounds like Di Bella is working on fixing the problem that clang doesn't like gnu libunwind? Is that your read? That would be the best thing.
13:55:02
drmeister
I'm building deploy on linux with code that patches llvm to add that option and then setting that option. Also, llvm libunwind is incorporated into /opt/clasp. Let's see if that works.
15:49:59
drmeister
Rebuilt deploy twice. I think the patch is active - but there's no difference in program behavior.
15:50:16
drmeister
Putting breakpoints on __unw_add_dynamic_fde and processFDE provide no illumination.
15:54:00
drmeister
Hmm, I can go into the deploy build directory and hack llvm directly, build and then install and it goes into /opt/clasp/
16:07:34
drmeister
Dammit - all this work because of that damn frame pointer elimination optimization.
18:08:26
drmeister
Ok, I see. So we walk the linked list of RBP and this checks if any given RBP is valid.
18:08:58
Bike
yeah, it's comparing against the value in the exception, which we get from llvm.frameaddress(0) i'm pretty sure
18:40:17
drmeister
So the only code that walks the stack to collect frame pointers is call_with_frame.
18:44:19
drmeister
If libunwind were working reliably I'd say we use it to walk the stack and get RBP for each frame within which it is available.
18:45:20
drmeister
What I mean is - do you see any problems with that idea other than the effed up libunwind situation that we talked about today.
18:45:46
Bike
i mean what i'd really ideally do is forget the stack pointer and get the local variable data from dwarf, but i guess llvm sucks at dwarf or something
18:46:01
drmeister
Right - dereferencing a builtin_frame_address or something that it points to is dangerous.
18:47:29
drmeister
The problem using libunwind is if we link with libunwind on linux that we run into the problems that we were talking about on discord. gnu libunwind doesn't work with clang code and llvm libunwind doesn't work with the JIT on linux.
18:50:25
drmeister
llvm optimizations wipe out a lot of DWARF info. function call arguments are the first to go.
18:50:52
drmeister
With a frame pointer and the stackmap entry for our register save area - we can get the call arguments.
18:52:33
Bike
it's essentially an extremely complex hack to work around llvm destroying things, which sucks
19:07:24
drmeister
I don't see it quite that way - they optimize the heck out of the code and values get lost. The function call arguments are of no use in optimized code.
19:09:48
Bike
we should be able to tell llvm to do a few optimizations without it destroying all information. it could spill them to the stack itself and put that information in dwarf, if worst comes to worse
19:11:49
drmeister
I'd love to be able to say - "the function argument variables - keep track of those through out the function".
19:11:53
Bike
anyway, i'm trying to figure out how to generate debuginfo for parameters in the first place now
19:12:15
drmeister
Even if that worked we still need to know where the stack frame starts - we don't have that.
19:14:31
drmeister
Also - what happens if the parameter changes? int foo(int x, int y) { x = 10; y = 5; ... signal-error }
19:15:01
Bike
well generating the debug info is just a few llvm.dbg.value calls. i tried it last week (?) and it caused verification failures and i reverted it, if you remember
19:16:11
drmeister
You generated debug info for the register save area and that caused problems. There was already code for closure,nargs,farg0,farg1,farg2,farg3. I see those in every stack frame although they are optimized away if we aren't on the first instruction of the function.
19:17:37
Bike
so currently we don't tell dwarf about the parameters or the RSA at all, i don't think
19:18:10
drmeister
I thought you had the closure,nargs,farg0,farg1,farg2,farg3 working for a long time - and then you added something (register save area?) and that failed - so did you unwind just the register save area or all of these?
19:18:54
Bike
things show up in the backtrace because we manually extract them without going through dwarf
19:19:30
drmeister
https://github.com/clasp-developers/clasp/blob/main/src/lisp/kernel/cmp/cmpintrinsics.lsp#L687
19:19:59
Bike
yes, but if you look at dbg-register-parameter, you'll see it only generates an llvm.dbg.value call if *debug-register-parameter* is true
19:22:22
drmeister
Ok - but since we don't use them it's immaterial. The register-save-area is what I'd like to get access using DWARF.
19:23:54
drmeister
Ok, so it's the register-save-area that causes the verification errors, not closure,nargs,farg0,farg1,farg2,farg3
19:24:20
Bike
the error was that there was no !dbg attachment, which would be true for either one, since they're both just generated with irc-intrinsic
19:24:23
drmeister
I'm saying I saw closure,nargs,farg0,farg1,farg2,farg3 for weeks without problems.
19:25:55
drmeister
I can't use closure,nargs,farg0,farg1,farg2,farg3 unless they are available throughout the function.
19:27:26
Bike
i think we have to figure out when we don't have source info at all, and then not use llvm.dbg.whatever at that point? maybe?
19:27:30
drmeister
Let's say we solve the !dbg attachment (I've fixed things like this in the past) how can we get the register save areas when we generate backtraces?
19:28:40
drmeister
It's going to be in call_with_frame - how would we get the register-save-areas for frames that provide it?
19:30:56
drmeister
I'd say we get the current frame pointer and then for each return address we lookup the size of its frame and then we subtract those sizes to break the stack up into frames.
19:31:16
drmeister
In a sane world without frame pointer elimination we would walk the linked list of frame pointers.
19:32:12
drmeister
If there is a facility to lookup the size of each frame - it will have to work with libraries and our jitted code. I know how to get DWARF info for out jitted code return addresses but libraries are a different matter.
19:34:30
drmeister
I'm also confused by libunwind. We can get RPB for each frame using a cursor - I see that. Can we get the start of each frame using a cursor?
19:37:18
drmeister
Presumably DWARF can still provide offsets of lexical variables when the frame pointer has been eliminated?
19:44:26
drmeister
So even if we have the DILocal info on a lexical variable using DWARF - we still have the problem of figuring out the start address of the stack frame.
19:49:40
drmeister
I get it. Unwinding the stack means figuring out where the return address after you unwind. If we knew this we would know everything we need.
19:59:14
Bike
lldb where do you initialize the frame base expression. how the hell can this not be obvious
20:08:37
drmeister
So it's looking like the eh_frame contains a table that maps IP to offsets to the return address on the stack - right?
20:09:01
drmeister
The eh_frame is in memory and it's relocated so that it works with absolute IP addresses.
20:09:45
Bike
okay, no, eh_frame does have it in a DWARF encoded way. the paper calls the frame address the "CFA" and that threw me off
20:10:35
drmeister
The authors of the paper describe compiling the bytecode to assembly code and speeding up unwinding 20x.
20:13:29
drmeister
Back to backtraces. Suppose we had a DWARF bytecode interpreter and we interpreted the eh_frame. We would take a return address and calculate the CFA for the frame to get the next return address.
20:15:52
drmeister
Maybe the CFA is the address of (or some constant offset from the address of) the next return address? Checking if libunwind gives us the address of the next return address...
20:26:23
Bike
i think libunwind does this. lldb definitely does, although i can't find the code where it gets the CFA expression to begin with
20:28:54
drmeister
Understood "CFA is the frame base, not the return address" - the return address will be at some offset from the CFA - right?
20:35:25
Bike
libunwind has a dwarf machine but it doesn't seem to be able to handle the fbreg instructions to get data from the frame
20:39:25
drmeister
Here is their DWARF compiler described in the paper: https://github.com/frdwarf/libunwind-eh_elf
20:40:14
Bike
that's interesting and all, but wouldn't we just be making things more complicated for ourselves?
20:40:51
drmeister
I'm not thinking about what to do moving forward yet. I'm just playing with ideas.
20:41:22
Bike
gnu libunwind does the cool thing where they use the preprocessor to construct function names, so it's hard to grep through
20:44:55
drmeister
What about this? https://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/baselib--unwind-backtrace.html
20:53:42
drmeister
The original problem was when we compile with clang and we link the gnu libunwind, JITted code that throws an exception fails.
20:55:36
drmeister
We throw an exception and a bunch of stuff happens where eh_frames are interpreted to generate tables that convert return addresses into the next return address to unwind the stack.
20:59:05
drmeister
I wonder if we had the udb debugger connected to clasp with and without libunwind linked in.
20:59:38
drmeister
We could look at the difference between what happens with libunwind and without libunwind when the exception is thrown from the JITted code.
21:02:07
drmeister
https://github.com/clasp-developers/clasp/blob/main/src/lisp/kernel/lsp/setf.lsp#L628
21:02:55
Colleen
Clhs: macro defmacro http://www.lispworks.com/documentation/HyperSpec/Body/m_defmac.htm
21:03:20
Bike
and it's a nonlocal return since early multiple-value-bind is macroexpanded into multiple-value-call of a lambda
21:05:09
drmeister
I wonder what is going on. It's essentially a try {.... inner(...) ... } catch (ReturnFrom& returnFrom) { ...} ... void inner(...){ throw ReturnFrom(...); }
21:08:49
Bike
trying to figure out what the personality is doing is probably more trouble than it's worth
21:09:49
drmeister
It's going to be running a virtual machine in here that is interpreting the bytecode in the eh_frame - and that virtual machine is not doing the right thing.