freenode/#clasp - IRC Chatlog

14:21:24 drmeister I got clasp building against llvm12 pulled last night with Lang’s new changes for jitlink and linux.

14:21:54 drmeister Testing it failed on Linux so I sent him the module and object file that broke it.

14:22:27 drmeister Currently in the future branch aclasp and bclasp build and run. Cclasp is broken.

14:22:38 drmeister Worse is that backtrace a are broken

14:22:49 drmeister Bactraces

14:22:55 drmeister Damnit

14:23:05 drmeister Backtraces.

14:23:32 drmeister The small code model works on macOS.

14:24:08 drmeister I’ll implement the new code gcing and then I’ll fix backtraces and then I’ll tackle cclasp

14:24:34 drmeister Lang will get back to me about jitlink on Linux

14:43:00 drmeister Bike - backtraces are broken. The name of the functions in the backtraces are wrong. I’ll dump and example.

14:43:08 drmeister What does it mean?

14:43:43 drmeister It means the function that takes a return address and returns a function name is broken - right?

14:43:58 Bike I don't see your example, but yeah, I guess?

14:44:10 drmeister Incoming...

14:49:03 drmeister https://www.irccloud.com/pastebin/6Juv1cPn/

14:49:38 drmeister Lots of boost::system::dummy_exported_function

14:50:07 Bike hm, yeah, thtat's screwy.

14:50:46 Bike this is in your branch, right? maybe the template stuff you put in involves actual pointers

14:50:51 Bike like actual functions

14:51:19 Bike it does get some C++ lisp functions like core::core__load_faso, i see

14:51:31 drmeister How do I get the raw return addresses from the backtrace facility?

14:51:48 drmeister Then what is the function that converts these return addresses to function names

14:52:14 drmeister Yeah - it get's some stuff. The core::core__load_faso is in the executable.

14:52:52 drmeister Going forward we should have just two ways to search symbols. 1. executable/libraries 2. object files in memory from jitting, faso files and image save/load.

14:53:38 Bike i think most of the debugger logic is in src/core/debugger.cc.

14:53:46 Bike the lisp code doesn't do a lot

14:54:20 Bike hm, but where are the frames actually filled up

14:55:17 drmeister operating_system_backtrace calls the operating system 'backtrace(buffer,num)' facility.

14:55:19 Bike debug_macosx.cc and stuff probably

14:55:26 Bike oh. yeah ok.

15:02:25 drmeister https://www.irccloud.com/pastebin/SHFlXPQg/

15:03:17 drmeister (core:call-with-backtrace (lambda (bt) (dolist (f bt) (print f))))

15:05:47 drmeister Right now that just prints 93 #<FRAME >

15:05:55 drmeister I'll add a __repr__ method for Frame_O

15:05:59 Bike i don't think frames... yeah.

15:09:55 drmeister We create a std::vector<BacktraceEntry> - BacktraceEntry is a low level C++ object, not exposed to Common Lisp. We fill it with everything that we can get.

15:10:34 drmeister Then we copy the info into Frame_O objects, cons up a list of them and pass them to a callback using (core:call-with-backtrace ...)

15:11:09 drmeister That way it's clear what is in the backtrace. Everything above the callback is in the backtrace and everything below it is not.

15:14:15 drmeister I broke up gc_interface.cc into 8 files and astExpose.cc into two. It shaves several minutes off the C++ build.

15:15:37 drmeister Now we are talking...

15:15:39 drmeister https://www.irccloud.com/pastebin/l8puxGet/

15:16:28 drmeister The return addresses don't match the function starts at all.

15:28:59 drmeister Object file maintenance is tied up in what I'm doing here - so it's not surprising that I broke somethign.

15:34:21 drmeister Ah - it's getting the name out of the FunctionDescription_O - and those are also broken.

15:35:52 drmeister We pull the closure address out of the stack frame.

15:36:08 drmeister Then we use that to look up the function description and pull the name out of that.

15:36:12 drmeister We don't use object files at all.

15:39:05 drmeister Then we have...

15:39:15 drmeister https://github.com/clasp-developers/clasp/blob/future/src/llvmo/debugInfoExpose.cc#L510

15:39:43 drmeister And that will take a return address and should return a bunch of info about the function from the very roundabout path through the object file.

15:41:04 drmeister https://www.irccloud.com/pastebin/OhD4WAiC/

15:41:10 drmeister Ha - that's helpful.

15:41:25 drmeister I have two ways to get a function name and they are both broken.

15:44:19 drmeister Ok - well, that's a start. I know where to start looking.

15:45:06 drmeister We need to make FunctionDescription_O objects literals so they can be saved and reconstituted by the literal compiler.

15:45:31 drmeister They are kind of like pathnames.

15:47:53 drmeister Bike: Also - we will need to add save-points/yield-points to all of our code.

15:48:02 Bike What?

15:48:11 Bike Like, savepoints?

15:48:13 Bike safepoints

15:50:01 drmeister These...

15:50:02 drmeister http://users.cecs.anu.edu.au/~steveb/pubs/papers/yieldpoint-ismm-2015.pdf

15:50:41 Bike "In the case of exact garbage collection, yieldpoints are known as GC-safe points" yeah, ok.

15:51:03 Bike What do we need to do that for? This is not really a small thing

15:51:16 drmeister Steve Blackburn just posted that link for me in the MMTk Zulip server.

15:51:37 drmeister LLVM has a pass that adds yieldpoints to llvm-IR - we should check it out.

15:55:13 Bike https://llvm.org/docs/Statepoints.html RewriteStatepointsForGC i guess?

15:55:24 drmeister That's one of them.

15:55:37 drmeister In the llvm 'opt' tool they also have options...

15:56:23 drmeister --verify-safepoint-ir , --rewrite-statepoints-for-gc , --strip-gc-relocates , --place-safepoints

15:57:08 drmeister There's a thing in llvm where you can use address-space(1) for certain pointers and the --rewrite-statepoints-for-gc does something to them with yieldpoints that makes them gc roots.

15:57:15 drmeister It's for precise GC on the stack.

15:57:29 drmeister Now - we can't get that for C++ code I don't think - but it would be good to understand it.

15:57:54 drmeister Oh - more 'opt' options...

15:58:06 drmeister --place-backedge-safepoints-impl

15:58:21 drmeister --safepoint-ir-verifier-print-only

15:59:18 drmeister There are a buttload of options for safepoints.

15:59:29 drmeister That's a real unit of measure, by the way.

16:00:36 drmeister Anyway, yieldpoints are what Steve Blackburn calls them and they are integral to MMTk.

16:00:45 drmeister We don't have them in boehm and MPS.

16:01:13 drmeister In MPS there is a complicated song and dance for allocations that avoids the problem of scanning incompletely initialized objects.

16:01:38 drmeister Boehm doesn't care - it will scan anything in the conservative mode and I still don't see how the precise mode is supposed to work.

16:02:07 drmeister And no one is left alive who can explain it - apparently.

16:08:28 drmeister I was surprised by this in the paper: "We find that Java benchmarks execute about 100 M yieldpoints per second,"

16:08:49 drmeister I was always afraid that the impact on performance would be high - but if Java can support this then it can't be that bad.

16:08:51 Bike if they're on every loop backedge and every function call that doesn't surprise me

16:10:07 drmeister Thinking about yieldpoints - they could enable other things like truly lock free data structures.

16:10:57 drmeister If I can pause each thread when I can guarantee that not one of them is in a certain block of code - then I could fix up data structures so that they could always be read lock free.

16:12:42 drmeister Things like the class name->class database. Currently it has a mutex.

16:13:27 drmeister But I could represent it as a hash-table and an atomic linked list of fixups that at certain yieldpoints would be applied to the hash-table.

16:16:08 drmeister Pause the world when we can be sure that certain data structures are not actively being altered, fix them up and then set the world in motion again.

16:34:06 docl drmeister: I know this is a little off topic here but just wondering your opinion, is artificial photosynthesis by spiroligomer photocatalyst decades away, or has the foundation already been laid? I note there are competing approaches in this realm already, e.g. nanowire photocatalysts https://patents.google.com/patent/US9528192B1/en

16:36:10 docl I'm a little skeptical of the nanowire approach being cost effective for large areas. ideally you'd have something that can go over large cropland areas or maybe big ponds akin to solar salt production facilities

16:58:04 Bike pausing the world seems kind of worse than using a lock?

17:09:02 drmeister Bike: I think with pausing the world you can control how often you do it with a lot of control./

17:09:24 drmeister Say we apply this to hash-tables.

17:09:56 drmeister We would add an atomic linked list of hash-table 'fixups' like an a-list that gets searched alongside the hash-table.

17:10:36 drmeister So to write to the hash-table you push a #<hash-table-write key value> onto the atomic fixup list.

17:11:16 drmeister If you remove an element you push a #<hash-table-remove key>

17:11:41 drmeister GETHASH would then check the hash-table and the fixup list - this would be lock free.

17:12:08 drmeister You don't want the fixup list to get too long because it's linear time to check it.

17:12:42 drmeister Occasionally, when you know that no thread is reading the hash-table you pause, apply the fixups to the hash-table and clear the fixup list.

17:13:31 drmeister We can control how long we let the fixup list grow before you want to pause and fix hash-tables.

17:13:46 Bike "occasionally, when you know" is at a safepoint?

17:14:01 drmeister The implementation of the fixup list would probably be a vector for faster reads.

17:14:16 Bike what happens when it's full?

17:14:26 Bike also, consing for every (setf gethash) seems slow.

17:14:49 drmeister Oh - you know what - you can' use a vector - because that can't be updated atomically without a mutex.

17:14:57 drmeister It's gotta be an atomic linked list.

17:15:33 Bike you can update a vector atomically. you just atomically increment the fill pointer, then store whatever at the old index. right?

17:15:48 Bike assuming nothing is removing from the vector simultaneously

17:16:03 drmeister Huh - you might have something there.

17:16:23 Bike i guess there'd be another problem if another thread is simultaneously reading the vector up to the fill pointer, and gets the uninitialized entry

17:16:53 drmeister Nothing would be removed from the fixup list except when the world is stopped.

17:17:29 drmeister Hmm, I'll have to think about that. The atomic linked list is definitely safe and flexible.

17:18:06 drmeister You could keep a pool of cons cells to draw from to add to the atomic linked list.

17:19:07 drmeister So (setf gethash) would grab a cons cell and a fixup record from a pool and fill the fixup record and push it to the head of the atomic fixup list.

17:20:27 drmeister The yieldpoints would check a bitmap of different kinds of yields against a thread local bitmap of what yields are permitted.

17:21:04 drmeister Then we would have suppress-yield and allow-yield calls that would modify the thread-local bitmap of yields that are allowed.

17:23:27 drmeister I'm sure if this is possible that someone has already published on it.

17:24:31 Bike those kind of worklist things are usually how wait-free algorithms work (i think) but they don't need safepoints

17:25:21 drmeister We already have safe points - we check a global variable to see if a signal handler has been activated.

17:26:30 drmeister If you don't have safepoints - you use signals? If you use signals - you have to poll occasionally (at a safepoint) to see if a signal came in.

17:26:49 drmeister Currently our safe-points are after every allocation I think.

17:27:12 Bike i am honestly not very confident that our use of signals is correct

17:27:25 Bike we still have that disturbing FPE issue

17:28:08 drmeister Well, adding yieldpoints/safepoints might give us a way to make it correct.

17:28:21 drmeister What is the disturbing FPE issue?

17:28:54 Bike https://github.com/clasp-developers/clasp/issues/961

17:29:21 Bike though this is probably not a thread safety thing

17:29:31 Bike but hey, who knows. i honestly have no idea what could be causing this

17:30:08 drmeister Ok - I think we can put this on a firmer foundation. Can you take a look at the safepoint/yieldpoint injection pass in llvm?

17:30:20 drmeister As in - what does it do to the code?

17:30:37 Bike sure.

17:30:50 Bike i would assume it basically puts some intrinsic calls at loop backedges and such

17:30:56 drmeister You could generate a .ll file using compile-file and then run the pass on it with opt --place-backedge-safepoints-impl

17:31:18 drmeister Then what does it look like in the new llvm-IR and what does it look like in the disassembled object file.

17:31:24 Bike not just --place-safepoints?

17:31:33 drmeister Sure - that one then. I'm getting real cozy with object files.

17:31:49 drmeister I added a facility to dump every JIT generated object file to /tmp/ one after the other.

17:32:19 drmeister I can do the same with modules and then absolutely everything the compiler generates can be inspected.

17:34:01 Bike "-impl" just says "don't use this directly" to me

17:39:17 drmeister Yeah

17:45:07 drmeister I'm trying to figure out function-description literals.

17:45:16 drmeister They are mostly straightforward - like pathname literals - but they have a twist.

17:45:42 drmeister They contain function entry points (currently only one) and a ObjectFile_sp tagged pointer.

17:45:50 drmeister The ObjectFile_sp will be the same for every FunctionDescription_O in the current object file.

17:46:32 drmeister I think I can get the ObjectFile_sp from a thread local slot that stores the current ObjectFile_sp that is being load-time evaluated.

17:46:37 drmeister The entry points...

17:46:51 drmeister They are pointers to functions within the object file.

17:47:15 drmeister Pointers to the relocated functions.

17:48:11 drmeister This is what a FunctionDescription_O looks like in the future branch.

17:48:45 drmeister https://github.com/clasp-developers/clasp/blob/future/include/clasp/core/functor.h#L82

17:49:48 drmeister I have all of the function entry points in a vector within the Module - so I can reference them using an integer.

17:51:47 drmeister I think I need to set up a call to a function like ltvc_make_function_description that is like ltvc_make_pathname

17:51:48 drmeister https://github.com/clasp-developers/clasp/blob/future/src/llvmo/link_intrinsics.cc#L396

17:52:49 drmeister It will take an integer argument for each entry point function and no argument for the ObjectFile_O object because it will get it from a thread local slot.

17:53:28 drmeister The rest of the info like sourcePathname, functionName, lambdaList, docstring, lineno, column, filepos - those will all be regular literals.

17:54:06 drmeister I could argue that I don't need some of these because they are made redundant by pulling the info out of the DWARF metadata.

17:54:50 drmeister multiple entry points will be handled in the future by adding more integer arguments, one for each entry point.

18:08:20 drmeister With the entry point I can get source file, function name string, lineno

18:20:17 Bike sorry, had to drop someone off

18:20:29 drmeister No worries

18:20:43 drmeister I'm mulling the ideas over

18:21:49 drmeister What do we use the file offset for? That's emacs - right?

18:21:59 drmeister For slime - so we can jump to the source position.

18:22:25 drmeister filepos in FunctionDescription_O above

18:22:51 Bike yeah.

18:23:04 drmeister Do we need it - or can we use lineno?

18:23:27 drmeister Is it redundant is the question.

18:23:52 Bike i think the problem was that sometimes we get an offset, and sometimes we get a lineno + column

18:24:38 drmeister Oh wait - it is redundant. This is the filepos of the function itself. Most of the time we need the filepos of a return address - that we only get from DWARF - and there all we get is what is in the line tables.

18:26:43 drmeister Is it the case that we get the offset only from the FunctionDescription objects. If we got rid of filepos maybe that would be best?

18:27:01 drmeister As in filepos is a complicating feature that doesn't bring a lot of value. Adds complexity with little value.

18:27:08 Bike mostly i remember that we really should have one single structure representing source pos info

18:27:20 Bike but we don't, and the reason is because of this offset/lineno+column crap

18:27:46 drmeister Ok and DWARF kind of makes our choice for us - lineno+column is what DWARF gives us.

18:28:58 Bike for example swank's compiler-condition class has a "location" that i think has to be the offset

18:29:50 Bike or, wait

18:30:02 Bike no, slime has some other weird format

18:30:16 Bike we give it an offset but maybe it could use a lineno instead

18:40:47 Bike https://github.com/slime/slime/blob/master/slime.el#L3353-L3358 i guess this is what we have. so lineno column is fine. maybe.

18:41:33 Bike i don't know why we use offsets so much, then

19:20:29 kpoeck drmeister Earlier today you said "- we check a global variable to see if a signal handler has been activated. ". Could you point me to that code? As Bike mentioned, at least fpe handling seems to be delayed and I would like to look into that.

19:24:08 drmeister kpoeck: I think it's in the gcalloc.h file - every call to handle_all_queued_interrupts();

19:24:56 drmeister That invokes this...

19:24:57 drmeister https://github.com/clasp-developers/clasp/blob/master/src/gctools/interrupt.cc#L209

19:25:14 drmeister It checks a thread-local list of pending interrupts

19:25:26 drmeister I think that is what passes for our safepoint/yieldpoints

19:27:19 kpoeck I am not sure a signal handler is invoked via that code, but will check.

19:27:55 drmeister I thought signal handlers were invoked asynchronously - from the outside.

19:29:05 drmeister The queue_signal_or_interrupt is what adds a handler to the thread local list of interrupts to handle.

19:29:56 drmeister Yeah - and that gets called from handle_or_queue_signal and that is a signal handler.

19:30:28 kpoeck ok, that sounds clear

19:31:18 drmeister So a thread gets a signal, handle_or_queue_signal is called. That may handle it or it pushes an interrupt onto the current threads list. Then handle_all_queued_interrupts gets called after almost every allocation to handle any interrupts that came in asynchronously.

19:32:13 drmeister We have this - and it bothers me now that I look at it...

19:32:14 drmeister https://github.com/clasp-developers/clasp/blob/master/src/gctools/interrupt.cc#L252

19:33:07 drmeister I think this means we are invoking our code from a signal handler - I think this is dangerous. Maybe we do it properly - maybe we don't.

19:34:20 drmeister My understanding is that there is very little that we are allowed to do in a signal handler.

19:34:55 drmeister And we can't unwind out of it.

19:35:15 Bike yeah so the problem is we kind of have to unwind out of them to use them for conditions.

19:35:19 Bike it's not great.

19:36:21 drmeister We can't unwind out of them is my understanding. Unix signals and C++ exceptions are fundamentally incompatible.

19:37:00 drmeister What we can do is insert yieldpoints in the code and check if a signal came in and then unwind from there.

19:37:33 drmeister So, in the arithmetic code - insert checks to see if a FPE signal occurred.

19:38:00 drmeister Hmm then there are problems with threads. Does the thread that generated the FPE get the FPE signal?

19:42:42 Bike iirc yes

19:44:15 drmeister Then we should be able to add a thread-local flag that an FPE occurred and inject a test for that into the arithmetic code.

19:48:07 Bike if we wanted to check flags on every single floating point operation we could turn off the signal handling and do that

19:48:12 Bike but i figured we didn't want to do that

20:34:41 drmeister How about a check only after instructions that could fail - like divisions

20:45:37 Bike i figured we didn't want to do a branch after every single division operation.

20:45:55 Bike also if we leave NaNs around or want particular rounding i think it really is every operation

20:48:04 kpoeck for an fpe, we run directly this: https://github.com/clasp-developers/clasp/blob/master/src/gctools/interrupt.cc#L316

20:48:21 kpoeck Is that kosher?

20:48:37 Bike depends on whether we can unwind from a handler or not

20:48:49 Bike if we can't, it's not kosher, and there's nothing we can do really, and posix sucks

20:49:46 drmeister What throws and exception to unwind?

20:49:57 drmeister an exception - the macro - hang on getting it.

20:49:57 Bike NO_INITIALIZERS_ERROR is a macro that eventually calls cl:error

20:50:08 drmeister NO_INITIALIZERS_ERROR

20:50:13 drmeister Nope - we can't do that.

20:50:16 Bike which can handle lisp condition handlers and restarts and stuff

20:50:26 Bike can call*

20:50:34 drmeister Nope, nope, nope.

20:50:41 Bike i looked it over and as far as i can tell, sbcl does unwind out of signal handlers and that works fine for it

20:50:54 drmeister This is happening inside a Unix signal handler?

20:50:59 Bike yes

20:51:28 Bike now, C++ exception handling in particular might be dicey

20:51:39 Bike because the exceptions library does all kidns of goofy shit that probably not reentrant

20:52:23 drmeister sbcl doesn't use C++ exception handling - it may be leaving something in the stack frames to let them unwind out of a signal handler but from everything that I've been told we cannot do this with C++ exception handling.

20:52:35 drmeister MSWindows has a way of unwinding out of signal handlers as well.

20:53:16 Bike i don't really understand why unwinding out of a signal handler in particular is not allowed. longjmp ought to work, i think

20:53:28 Bike might get weird with sigaltstack, but other than that, i don't see the problem

20:54:07 Bike assuming longjmp is reentrant i guess. maybe all the dumb shit stuffed into jmp_bufs makes that hard

20:58:33 drmeister I gotta watch this:

20:58:34 drmeister https://www.youtube.com/watch?v=_Ivd3qzgT7U

21:00:03 drmeister A signal can happen between any two instructions - right? Is the stack frame that the signal handler sets up able to synchronize with the itanium unwinding machinery.

21:03:07 Bike my impression was that within a handler you still have, like, a normal stack with frames and everything

21:03:25 Bike above your handler function is whatever operating system machinery, and then above that is the frame of the function that was running when the signal interrupted

21:07:58 kpoeck apart from what are discussing, I believe we have 1 more issue. Just try (mod 1 0) in clasp, sometimes, we get the fpe, sometimes clasp hangs with 99,9% cpu in core::clasp_truncate. lldb knows that an exception happens, but handle_fpe is not called.

21:09:04 drmeister Bike! Check this talk out at 29m30s. https://www.youtube.com/watch?v=_Ivd3qzgT7U

21:09:32 drmeister He talks about the lock that slows down multithreaded unwinding.

21:10:43 Bike "the main problem with unwinding is dlopen and dlclose" i really hate the C++-brain on this topic, man

21:13:17 drmeister I don't have the mental bandwidth to follow the whole talk - but there may be something about signal handlers in there.

21:14:09 Bike well, doing all that dwarf stuff is almost certainly not signal safe in general

21:17:51 drmeister There is a question at 46min where signals are brought up.

21:19:46 Bike libunwind is safe, c++ exceptions are not. yeah, makes sense

21:19:55 drmeister What does that mean?

21:21:03 Bike looking through stack frames and stuff is no problem. so in a lisp-style unwinding setup, unwinding would be no issue, since all the cleanup and destination info is on the stack. on the other hand, getting the corresponding dwarf info from disc/wherever is not signal safe

21:21:30 Bike i guess if we like, receive a signal in the middle of dl_iterate_phdr, or something weird like that

21:28:03 Bike my understanding though is that this stuff should be less of a problem for _synchronous_ signals like sigfpe, which is after all signaled at only particular places (where we do a floating point operation)

21:28:15 Bike same with sigsegv i think

21:32:17 drmeister Ok - then can we get it to work reliably?

21:32:51 Bike i don't know. like i said, i don't even understand what the problem is in our bug 961

21:40:20 drmeister It might be a good candidate for debugging with the undo debugger.

21:40:32 drmeister Yeah - that's the tool for the job.

21:41:18 drmeister Does the unwinding happen with a library that we can get debug information for on linux?

21:42:38 Bike um, maybe? it'd involve libgcc and stuff

21:48:07 drmeister Is it possible to link with libgcc with debugging information?

21:48:08 drmeister https://wiki.osdev.org/Libgcc

22:01:51 drmeister Bike: Would you like to try debugging this with udb? I can set you up.

22:02:20 Bike i could try. at the moment it's just crashing outright instead of just signaling weird errors like it used to, though

22:08:41 drmeister As in...

22:08:43 drmeister https://www.irccloud.com/pastebin/MVJokR30/

22:09:03 drmeister The abort trap is not due to (abort ()) is it?

22:11:59 Bike i don't think so?

22:13:18 drmeister Tried it again:

22:13:20 drmeister https://www.irccloud.com/pastebin/7t3jAgFT/

22:13:23 drmeister That's different

22:14:19 Bike oh. yeah, that's the error i used to get

22:14:32 Bike and then if you try to abort you get the same out-of-extent-unwind so you're stuck in the debugger forever.

23:12:39 drmeister Do we need to be able to get the declarations for a function from the function-description?

23:16:38 Bike i don't think so?

23:20:09 drmeister I'm feeling bad about dropping it from the FunctionDescripton.

23:20:16 drmeister it has the lambda-list

23:20:30 drmeister And the docstring.

23:20:38 drmeister I really should put the declares in there.

23:26:08 drmeister I tossed it in.

23:26:29 drmeister It's easy to add now - later I'll forget what I did.