freenode/#clasp - IRC Chatlog
Search
15:32:27
Bike
drmeister: https://github.com/clasp-developers/clasp/wiki/Manual is the manual - obviously it's sketchy and not much to look at
15:57:36
Bike
drmeister: i remember a few years ago (geez) you mentioned talking with people about how to use the lldb stuff to parse dwarf, and were told you kind of need an external debugger process?
18:03:08
Bike
the backtrace code is fundamentally kind of confusing in that it fills up our custom structure and then we sort of build things off that
18:03:20
Bike
it might be easier to have just a bare "list of return addresses" that we then process
18:03:38
Bike
if we want to support backtraces early this might involve having duplicate code in both C++ and lisp, though
18:45:16
Bike
"../../src/core/debugger.cc:1006:search_symbol_table I disabled symbol list searching for now - use DWARF instead I think" i see.
18:52:18
Bike
so just using the existing functions and a more basic operating_system_backtrace, it seems to get the function names and line numbers for most frames
18:52:40
Bike
though there are some mysterious elements, like having a "Line" of 999905 which is obviously a placeholder, but a "StartLine" of 891 which is correct
18:54:00
drmeister
Regarding finding object files for addresses - that's straightforward. I'll walk you through it in a moment.
18:55:06
drmeister
To be clear - there are "object files" these are the blobs of bytes that contain an Elf or Macho data structure.
18:55:45
Bike
what I mean is that M-. from lisp to a lisp function defined in C++ puts to the wrapper code
18:56:59
drmeister
Then there are the new classes "Code_O" and "Library_O". A Code_O is a stretchy vector blob of bytes that CONTAINS ALL OF THE RELOCATED CODE
18:57:46
drmeister
Furthermore, a Code_O instance points to an ObjectFile_O instance that references an "object file".
18:59:37
drmeister
Here's Library_O: https://github.com/clasp-developers/clasp/blob/future/include/clasp/llvmo/code.h#L218
19:00:38
drmeister
https://github.com/clasp-developers/clasp/blob/future/include/clasp/llvmo/code.h#L157
19:01:02
drmeister
ObjectFile_O: https://github.com/clasp-developers/clasp/blob/future/include/clasp/llvmo/code.h#L41
19:01:28
Bike
okay, so just using the existing core:object-file-address-information function seems to return something coherent for jitted functions already
19:01:58
drmeister
The "object file" is referenced using this: https://github.com/clasp-developers/clasp/blob/future/include/clasp/llvmo/code.h#L45
19:03:12
drmeister
Now - FYI and for future referece, an llvm::MemoryBuffer is interesting - it can point to memory on disk.
19:06:02
drmeister
https://github.com/clasp-developers/clasp/blob/future/include/clasp/core/lisp.h#L324
19:07:56
drmeister
I am not removing anything from them - they are thread-safe - you can walk them and you can add things to them using CAS.
19:09:00
drmeister
The only time I remove anything from them is when I save a snapshot - I wipe out the _AllObjectFiles list and collect garbage a couple of times.
19:09:27
drmeister
Any Code_O, ObjectFile_O instance at that point that isn't rooted gets collected and does not get saved into the snapshot.
19:10:27
Bike
i guess first i should just figure out what's screwing up the backtrace code, snce it looks like the pieces work perfectly well already
19:11:18
drmeister
Oh - also, ObjectFile_O instance point to a Code_O instance after their "object file" is passed to the JIT.
19:12:16
drmeister
I suspect we are using the DWARF functions somewhere. I hollowed out the symbol table stuff in debugging.cc/debug_unix.cc/debug_macos.cc
19:13:03
drmeister
All this Code_O/ObjectFile_O/Library_O stuff - it applies all the way from the interpreter->aclasp->bclasp->cclasp
19:13:48
Bike
i guess there's some kind of two layer thing where when this doesn't work it searches object files, and that's failing. i see
19:15:51
Bike
the problematic search_symbol_table function gets cllaed from elf_loaded_object_callback
19:16:33
drmeister
Compile-file generates faso files, they are just concatenated "object file"s with an index The index bins the "object file"s into JITDylibs - I can tell you about how those fit in later. They aren't that big a deal.
19:18:18
drmeister
The search_symbol_table needs a fresh approach. I think it needs to figure out if the address is in a library and then build a DwarfContext for that library.
19:36:58
drmeister
I think we have to dig into the llvm source code or lldb to see how these are created from executables (elf and dsymutil output)
20:01:57
Bike
it takes a DWARFObject, according to doxygen, but we pass it an object file and that seems to work
20:02:07
drmeister
https://llvm.org/doxygen/classllvm_1_1DWARFContext.html#ad4453328befe11c89bd0aff1df3ca77e
20:04:23
drmeister
They can also be directories - and that probably contains macho files and those look like elf files.
20:05:04
Bike
"A dSYM file is an ELF file that contains DWARF (debugging with attributed record formats) debug information for your application. ", according to something i googled
20:06:40
drmeister
Maybe they just use the common features of macho and elf and you are left with an elf file?
20:08:13
drmeister
(3) You build a DWARFContext for that thing and use the (address - base_address) to look up the info in the DWARFContext.
20:09:27
drmeister
We can bin the addresses by library and object file so once we have a DWARFContext we look for all of the addresses within it in one loop.
20:09:52
drmeister
That will be valuable for libraries - and may be less important for object files.
20:10:22
drmeister
I built the symbol tables from libraries at startup and basically replicated what the DWARFContext gives you.
20:11:08
drmeister
But on macos I didn't put in the work to figure out the macho details. I used popen 'nm' and used that to look up symbol relative addresses.
20:11:54
Bike
so i mean... do we actually want to do this? can we not just use the actual object files?
20:51:08
Bike
i just tried getLocalsForAddress, seems to work at least to the extent of getting names
20:51:24
Bike
of course there aren't any interesting names, but i could merge in my debug vars branch
21:30:15
Bike
ok so the actual slime problem appears to be that *restart-clusters* is somehow bound to garbage
21:42:48
Bike
it just causes an infinite error crash because compute-restarts signals an error when it hits the weird list, and then the sldb debugger pops up and wants to compute the restarts, etc
21:59:41
Bike
is there a possibility something low level is going wrong so that we're reading garbage from a dynamic variable?
22:28:28
drmeister
I can show you my pretty useful debugging environment using udb and the python udb extension.
22:29:18
drmeister
If you are pulling garbage out of a dynamic variable we can inspect it to see if it is there and then watch the memory location to catch when it gets set to that value.
22:35:17
drmeister
Is the output ../../src/core/debugger.cc:1006:search_symbol_table I disabled symbol list searching for now - use DWARF instead I think
22:36:27
drmeister
How does slime get a backtrace? Write some functions that replicate that interactively
22:36:49
Bike
you can try (clasp-debug:print-backtrace) for a start, but more complicated things work fine too
22:37:46
Bike
(clasp-debug:with-stack (s) (clasp-debug:map-stack #'clasp-debug:prin1-frame-call s)) still no problem, though it's ugly to look at
22:50:54
drmeister
Will they confuse things - or should we use the "monitor" facility we added to write messages out to a separate stream.
22:51:54
Bike
i tried putting one in but it didn't fire, so maybe i misunderstood where the problem was
23:23:27
Bike
i just stuck (do ((r *restart-clusters* (cdr r))) ((null r)) (unless (listp (cdr r)) (print r) (print (core:object-address r)))) at the head of compute-restarts
23:50:51
drmeister
I'm setting up the compute-restarts change you propose above and then I'd like to show you how find the problem quickly using udb and the low level debugging setup I've been working on.
23:51:53
drmeister
I need to get started on a grant proposal but I'd like to show you this and give you access to the machine that has udb so you could make more rapid progress with these runtime issues.
1:02:02
Bike
https://common-lisp.net/project/slime/doc/html/Communication-style.html#Communication-style
1:10:02
drmeister
The emacs that I ran slime-connect within just sits there. But when I hit control-C sldb comes up with the interrupt
1:15:31
drmeister
I connected the debugger without the -tui option - it starts up significantly faster
1:25:45
drmeister
Could you join me again and we look at this. I have something - just not sure what.
1:31:23
drmeister
Now - in boehm similar sized objects all get allocated together - so there are a bunch of cons cells here.
1:32:07
drmeister
The header/car/cdr of what SHOULD be a cons cell at 0x7fd3dbbc1f08 --> 0xc17df4400000002a 0x0000000006dcfb00 0x00007fd3e4a09451
1:35:10
drmeister
But this is followed by what looks like a valid cons cell: 0xdbbc1f2800000003 0x00007fd3dbbac631 0x000000000000006a
1:36:50
drmeister
Also the alignment is wrong - it looks like it should be a CONS cell every 4 words.
1:38:07
drmeister
I'm going to modify the compute-restarts code so it dumps the previous whole list if your test is successful.