freenode/#clasp - IRC Chatlog

17:12:42 drmeister Occasionally, when you know that no thread is reading the hash-table you pause, apply the fixups to the hash-table and clear the fixup list.

17:13:31 drmeister We can control how long we let the fixup list grow before you want to pause and fix hash-tables.

17:13:46 Bike "occasionally, when you know" is at a safepoint?

17:14:01 drmeister The implementation of the fixup list would probably be a vector for faster reads.

17:14:16 Bike what happens when it's full?

17:14:26 Bike also, consing for every (setf gethash) seems slow.

17:14:49 drmeister Oh - you know what - you can' use a vector - because that can't be updated atomically without a mutex.

17:14:57 drmeister It's gotta be an atomic linked list.

17:15:33 Bike you can update a vector atomically. you just atomically increment the fill pointer, then store whatever at the old index. right?

17:15:48 Bike assuming nothing is removing from the vector simultaneously

17:16:03 drmeister Huh - you might have something there.

17:16:23 Bike i guess there'd be another problem if another thread is simultaneously reading the vector up to the fill pointer, and gets the uninitialized entry

17:16:53 drmeister Nothing would be removed from the fixup list except when the world is stopped.

17:17:29 drmeister Hmm, I'll have to think about that. The atomic linked list is definitely safe and flexible.

17:18:06 drmeister You could keep a pool of cons cells to draw from to add to the atomic linked list.

17:19:07 drmeister So (setf gethash) would grab a cons cell and a fixup record from a pool and fill the fixup record and push it to the head of the atomic fixup list.

17:20:27 drmeister The yieldpoints would check a bitmap of different kinds of yields against a thread local bitmap of what yields are permitted.

17:21:04 drmeister Then we would have suppress-yield and allow-yield calls that would modify the thread-local bitmap of yields that are allowed.

17:23:27 drmeister I'm sure if this is possible that someone has already published on it.

17:24:31 Bike those kind of worklist things are usually how wait-free algorithms work (i think) but they don't need safepoints

17:25:21 drmeister We already have safe points - we check a global variable to see if a signal handler has been activated.

17:26:30 drmeister If you don't have safepoints - you use signals? If you use signals - you have to poll occasionally (at a safepoint) to see if a signal came in.

17:26:49 drmeister Currently our safe-points are after every allocation I think.

17:27:12 Bike i am honestly not very confident that our use of signals is correct

17:27:25 Bike we still have that disturbing FPE issue

17:28:08 drmeister Well, adding yieldpoints/safepoints might give us a way to make it correct.

17:28:21 drmeister What is the disturbing FPE issue?

17:28:54 Bike https://github.com/clasp-developers/clasp/issues/961

17:29:21 Bike though this is probably not a thread safety thing

17:29:31 Bike but hey, who knows. i honestly have no idea what could be causing this

17:30:08 drmeister Ok - I think we can put this on a firmer foundation. Can you take a look at the safepoint/yieldpoint injection pass in llvm?

17:30:20 drmeister As in - what does it do to the code?

17:30:37 Bike sure.

17:30:50 Bike i would assume it basically puts some intrinsic calls at loop backedges and such

17:30:56 drmeister You could generate a .ll file using compile-file and then run the pass on it with opt --place-backedge-safepoints-impl

17:31:18 drmeister Then what does it look like in the new llvm-IR and what does it look like in the disassembled object file.

17:31:24 Bike not just --place-safepoints?

17:31:33 drmeister Sure - that one then. I'm getting real cozy with object files.

17:31:49 drmeister I added a facility to dump every JIT generated object file to /tmp/ one after the other.

17:32:19 drmeister I can do the same with modules and then absolutely everything the compiler generates can be inspected.

17:34:01 Bike "-impl" just says "don't use this directly" to me

17:39:17 drmeister Yeah

17:45:07 drmeister I'm trying to figure out function-description literals.

17:45:16 drmeister They are mostly straightforward - like pathname literals - but they have a twist.

17:45:42 drmeister They contain function entry points (currently only one) and a ObjectFile_sp tagged pointer.

17:45:50 drmeister The ObjectFile_sp will be the same for every FunctionDescription_O in the current object file.

17:46:32 drmeister I think I can get the ObjectFile_sp from a thread local slot that stores the current ObjectFile_sp that is being load-time evaluated.

17:46:37 drmeister The entry points...

17:46:51 drmeister They are pointers to functions within the object file.

17:47:15 drmeister Pointers to the relocated functions.

17:48:11 drmeister This is what a FunctionDescription_O looks like in the future branch.

17:48:45 drmeister https://github.com/clasp-developers/clasp/blob/future/include/clasp/core/functor.h#L82

17:49:48 drmeister I have all of the function entry points in a vector within the Module - so I can reference them using an integer.

17:51:47 drmeister I think I need to set up a call to a function like ltvc_make_function_description that is like ltvc_make_pathname

17:51:48 drmeister https://github.com/clasp-developers/clasp/blob/future/src/llvmo/link_intrinsics.cc#L396

17:52:49 drmeister It will take an integer argument for each entry point function and no argument for the ObjectFile_O object because it will get it from a thread local slot.

17:53:28 drmeister The rest of the info like sourcePathname, functionName, lambdaList, docstring, lineno, column, filepos - those will all be regular literals.

17:54:06 drmeister I could argue that I don't need some of these because they are made redundant by pulling the info out of the DWARF metadata.

17:54:50 drmeister multiple entry points will be handled in the future by adding more integer arguments, one for each entry point.

18:08:20 drmeister With the entry point I can get source file, function name string, lineno

18:20:17 Bike sorry, had to drop someone off

18:20:29 drmeister No worries

18:20:43 drmeister I'm mulling the ideas over

18:21:49 drmeister What do we use the file offset for? That's emacs - right?

18:21:59 drmeister For slime - so we can jump to the source position.

18:22:25 drmeister filepos in FunctionDescription_O above

18:22:51 Bike yeah.

18:23:04 drmeister Do we need it - or can we use lineno?

18:23:27 drmeister Is it redundant is the question.

18:23:52 Bike i think the problem was that sometimes we get an offset, and sometimes we get a lineno + column

18:24:38 drmeister Oh wait - it is redundant. This is the filepos of the function itself. Most of the time we need the filepos of a return address - that we only get from DWARF - and there all we get is what is in the line tables.

18:26:43 drmeister Is it the case that we get the offset only from the FunctionDescription objects. If we got rid of filepos maybe that would be best?

18:27:01 drmeister As in filepos is a complicating feature that doesn't bring a lot of value. Adds complexity with little value.

18:27:08 Bike mostly i remember that we really should have one single structure representing source pos info

18:27:20 Bike but we don't, and the reason is because of this offset/lineno+column crap

18:27:46 drmeister Ok and DWARF kind of makes our choice for us - lineno+column is what DWARF gives us.

18:28:58 Bike for example swank's compiler-condition class has a "location" that i think has to be the offset

18:29:50 Bike or, wait

18:30:02 Bike no, slime has some other weird format

18:30:16 Bike we give it an offset but maybe it could use a lineno instead

18:40:47 Bike https://github.com/slime/slime/blob/master/slime.el#L3353-L3358 i guess this is what we have. so lineno column is fine. maybe.

18:41:33 Bike i don't know why we use offsets so much, then

19:20:29 kpoeck drmeister Earlier today you said "- we check a global variable to see if a signal handler has been activated. ". Could you point me to that code? As Bike mentioned, at least fpe handling seems to be delayed and I would like to look into that.

19:24:08 drmeister kpoeck: I think it's in the gcalloc.h file - every call to handle_all_queued_interrupts();

19:24:56 drmeister That invokes this...

19:24:57 drmeister https://github.com/clasp-developers/clasp/blob/master/src/gctools/interrupt.cc#L209

19:25:14 drmeister It checks a thread-local list of pending interrupts

19:25:26 drmeister I think that is what passes for our safepoint/yieldpoints

19:27:19 kpoeck I am not sure a signal handler is invoked via that code, but will check.

19:27:55 drmeister I thought signal handlers were invoked asynchronously - from the outside.

19:29:05 drmeister The queue_signal_or_interrupt is what adds a handler to the thread local list of interrupts to handle.

19:29:56 drmeister Yeah - and that gets called from handle_or_queue_signal and that is a signal handler.

19:30:28 kpoeck ok, that sounds clear

19:31:18 drmeister So a thread gets a signal, handle_or_queue_signal is called. That may handle it or it pushes an interrupt onto the current threads list. Then handle_all_queued_interrupts gets called after almost every allocation to handle any interrupts that came in asynchronously.

19:32:13 drmeister We have this - and it bothers me now that I look at it...

19:32:14 drmeister https://github.com/clasp-developers/clasp/blob/master/src/gctools/interrupt.cc#L252

19:33:07 drmeister I think this means we are invoking our code from a signal handler - I think this is dangerous. Maybe we do it properly - maybe we don't.

19:34:20 drmeister My understanding is that there is very little that we are allowed to do in a signal handler.

19:34:55 drmeister And we can't unwind out of it.

19:35:15 Bike yeah so the problem is we kind of have to unwind out of them to use them for conditions.

19:35:19 Bike it's not great.

19:36:21 drmeister We can't unwind out of them is my understanding. Unix signals and C++ exceptions are fundamentally incompatible.

19:37:00 drmeister What we can do is insert yieldpoints in the code and check if a signal came in and then unwind from there.

19:37:33 drmeister So, in the arithmetic code - insert checks to see if a FPE signal occurred.

19:38:00 drmeister Hmm then there are problems with threads. Does the thread that generated the FPE get the FPE signal?

19:42:42 Bike iirc yes

19:44:15 drmeister Then we should be able to add a thread-local flag that an FPE occurred and inject a test for that into the arithmetic code.

19:48:07 Bike if we wanted to check flags on every single floating point operation we could turn off the signal handling and do that

19:48:12 Bike but i figured we didn't want to do that

20:34:41 drmeister How about a check only after instructions that could fail - like divisions

20:45:37 Bike i figured we didn't want to do a branch after every single division operation.

20:45:55 Bike also if we leave NaNs around or want particular rounding i think it really is every operation

20:48:04 kpoeck for an fpe, we run directly this: https://github.com/clasp-developers/clasp/blob/master/src/gctools/interrupt.cc#L316

20:48:21 kpoeck Is that kosher?

20:48:37 Bike depends on whether we can unwind from a handler or not

20:48:49 Bike if we can't, it's not kosher, and there's nothing we can do really, and posix sucks

20:49:46 drmeister What throws and exception to unwind?

20:49:57 drmeister an exception - the macro - hang on getting it.

20:49:57 Bike NO_INITIALIZERS_ERROR is a macro that eventually calls cl:error

20:50:08 drmeister NO_INITIALIZERS_ERROR

20:50:13 drmeister Nope - we can't do that.

20:50:16 Bike which can handle lisp condition handlers and restarts and stuff

20:50:26 Bike can call*

20:50:34 drmeister Nope, nope, nope.

20:50:41 Bike i looked it over and as far as i can tell, sbcl does unwind out of signal handlers and that works fine for it

20:50:54 drmeister This is happening inside a Unix signal handler?

20:50:59 Bike yes

20:51:28 Bike now, C++ exception handling in particular might be dicey

20:51:39 Bike because the exceptions library does all kidns of goofy shit that probably not reentrant

20:52:23 drmeister sbcl doesn't use C++ exception handling - it may be leaving something in the stack frames to let them unwind out of a signal handler but from everything that I've been told we cannot do this with C++ exception handling.

20:52:35 drmeister MSWindows has a way of unwinding out of signal handlers as well.

20:53:16 Bike i don't really understand why unwinding out of a signal handler in particular is not allowed. longjmp ought to work, i think

20:53:28 Bike might get weird with sigaltstack, but other than that, i don't see the problem

20:54:07 Bike assuming longjmp is reentrant i guess. maybe all the dumb shit stuffed into jmp_bufs makes that hard

20:58:33 drmeister I gotta watch this:

20:58:34 drmeister https://www.youtube.com/watch?v=_Ivd3qzgT7U

21:00:03 drmeister A signal can happen between any two instructions - right? Is the stack frame that the signal handler sets up able to synchronize with the itanium unwinding machinery.

21:03:07 Bike my impression was that within a handler you still have, like, a normal stack with frames and everything

21:03:25 Bike above your handler function is whatever operating system machinery, and then above that is the frame of the function that was running when the signal interrupted

21:07:58 kpoeck apart from what are discussing, I believe we have 1 more issue. Just try (mod 1 0) in clasp, sometimes, we get the fpe, sometimes clasp hangs with 99,9% cpu in core::clasp_truncate. lldb knows that an exception happens, but handle_fpe is not called.

21:09:04 drmeister Bike! Check this talk out at 29m30s. https://www.youtube.com/watch?v=_Ivd3qzgT7U

21:09:32 drmeister He talks about the lock that slows down multithreaded unwinding.

21:10:43 Bike "the main problem with unwinding is dlopen and dlclose" i really hate the C++-brain on this topic, man

21:13:17 drmeister I don't have the mental bandwidth to follow the whole talk - but there may be something about signal handlers in there.

21:14:09 Bike well, doing all that dwarf stuff is almost certainly not signal safe in general

21:17:51 drmeister There is a question at 46min where signals are brought up.

21:19:46 Bike libunwind is safe, c++ exceptions are not. yeah, makes sense

21:19:55 drmeister What does that mean?

21:21:03 Bike looking through stack frames and stuff is no problem. so in a lisp-style unwinding setup, unwinding would be no issue, since all the cleanup and destination info is on the stack. on the other hand, getting the corresponding dwarf info from disc/wherever is not signal safe

21:21:30 Bike i guess if we like, receive a signal in the middle of dl_iterate_phdr, or something weird like that

21:28:03 Bike my understanding though is that this stuff should be less of a problem for _synchronous_ signals like sigfpe, which is after all signaled at only particular places (where we do a floating point operation)

21:28:15 Bike same with sigsegv i think

21:32:17 drmeister Ok - then can we get it to work reliably?

21:32:51 Bike i don't know. like i said, i don't even understand what the problem is in our bug 961

21:40:20 drmeister It might be a good candidate for debugging with the undo debugger.

21:40:32 drmeister Yeah - that's the tool for the job.

21:41:18 drmeister Does the unwinding happen with a library that we can get debug information for on linux?

21:42:38 Bike um, maybe? it'd involve libgcc and stuff

21:48:07 drmeister Is it possible to link with libgcc with debugging information?

21:48:08 drmeister https://wiki.osdev.org/Libgcc

22:01:51 drmeister Bike: Would you like to try debugging this with udb? I can set you up.

22:02:20 Bike i could try. at the moment it's just crashing outright instead of just signaling weird errors like it used to, though

22:08:41 drmeister As in...

22:08:43 drmeister https://www.irccloud.com/pastebin/MVJokR30/

22:09:03 drmeister The abort trap is not due to (abort ()) is it?

22:11:59 Bike i don't think so?

22:13:18 drmeister Tried it again:

22:13:20 drmeister https://www.irccloud.com/pastebin/7t3jAgFT/

22:13:23 drmeister That's different

22:14:19 Bike oh. yeah, that's the error i used to get

22:14:32 Bike and then if you try to abort you get the same out-of-extent-unwind so you're stuck in the debugger forever.

23:12:39 drmeister Do we need to be able to get the declarations for a function from the function-description?

23:16:38 Bike i don't think so?

23:20:09 drmeister I'm feeling bad about dropping it from the FunctionDescripton.

23:20:16 drmeister it has the lambda-list

23:20:30 drmeister And the docstring.

23:20:38 drmeister I really should put the declares in there.

23:26:08 drmeister I tossed it in.

23:26:29 drmeister It's easy to add now - later I'll forget what I did.

23:29:26 drmeister I have a bit of a code smell with what I'm doing.

23:29:54 drmeister I have made FunctionDescription_sp objects literals - but I don't set the entry point in the literal.

23:30:12 drmeister I create the FunctionDescription_sp object at load time and then I set the entry point using a function pointer.

23:30:38 drmeister It's a weird two-step procedure.

23:31:04 drmeister Once I get it working I'll look closely at it and see if I can set the entry point in the FunctionDescription_sp object directly.

23:38:54 drmeister It's kinda like set-funcallable-instance-function?

23:39:22 drmeister Maybe that elevates it too much. It's weird.

23:40:54 drmeister The solution is going to involve the ltv/function-description - I need to write out a reference to the entry-point.

23:41:03 drmeister https://www.irccloud.com/pastebin/6dksw0Lo/

23:42:10 drmeister Hoooo - thank goodness for sanity checks.

23:42:24 drmeister https://www.irccloud.com/pastebin/gvJp6ag3/

23:42:47 drmeister I almost got my %function-descriptions% out of sync with FunctionDescription_O

23:42:51 drmeister That would be a nightmare.

23:44:17 drmeister NiGhTmArE

23:52:06 drmeister Alrighty - function-descriptions are now literals created at load-time.

23:53:49 drmeister We still have stupid broken backtraces.

23:53:50 drmeister https://www.irccloud.com/pastebin/uRXakDmk/

23:54:00 drmeister boost::system::dummy_exported_function

0:05:51 Bike this is in your branch, not master, right?

0:06:37 drmeister Yeah - it's all in the 'future' branch.

0:10:26 drmeister This is correct...

0:10:28 drmeister https://www.irccloud.com/pastebin/2NR0c3qt/

0:10:41 drmeister Except for the ObjectFile - which isn't defined yet.

0:17:20 drmeister Ok - I see why backtraces are broken. It's pretty easy to fix.

0:18:25 drmeister FunctionDescriptions used to be raw blocks of data with a name appended with "^DESC". That's gone now and they are first class objects.

3:02:50 krkini ** NICK kini

4:09:13 beach Good morning everyone!