freenode/#clasp - IRC Chatlog
Search
20:51:01
Bike
drmeister: i've been seeing transient errors about seemingly random dynamic variables being unbound in the asdf/serve-event build step. shiho just hit it (in work branch) so it may not relate to my issues
21:02:07
kpoeck
drmeister: https://github.com/clasp-developers/clasp/wiki/How-to-load-cl-python-in-clasp
21:13:47
drmeister
Methods have va-list on the stack - so that may not be an indicator of a problem.
21:14:48
drmeister
I've switched to the 'work' branch - this provides a working version of clasp that I'm adding changes to.
21:30:51
drmeister
Here's the poop - there is a flaw in the compiler code that we merged into dev that currently shows up when we compile babel. Just prior to merging the new compiler code into dev we pushed the working dev code to testing->preview->master. We are working on finding and fixing the compiler code - but that takes time.
21:31:30
drmeister
In the meantime - to keep working I I cherry-picked everything after the merge into a branch that built on the good 'testing' branch and I call it 'work'.
21:47:53
drmeister
kpoeck: We had a discussion on this end - it is difficult to unwind that one merge commit - we tried and got something that didn't build - although it should have because later commits didn't refer to the merge commit.
21:59:40
drmeister
(sigh) I had turned on cmp::*compile-file-parallel* in the wscript file - so even when I turn it off in compile-file-parallel the build system turned it back on.
22:21:42
drmeister
I put (compile-file-parallel <asdf>) in a loop - to try and reproduce a crash we are seeing.
23:51:25
drmeister
Bike. The changes to the compiler that were merged into dev. - did they cnange the llvm ir that was generated?
23:54:13
drmeister
Ok - I thought maybe dynamic environment changes might not effect the code generation
2:01:47
drmeister
I'll put it back and build it over and over again tonight - in another shell I'll do it with the (hander-case ...) wrapping it.
2:02:49
drmeister
I did notice that the _Binding slot of Symbol_O was not atomic - that could create a race condition if multiple threads try to write to it at the same time.
2:05:21
drmeister
It's ..........possible.......... that this could be happening here because I start up a bunch of threads to service the compile-file-parallel.
2:31:01
drmeister
gensyms are not in packages - so they get deleted once nothing refers to them. But I guess they remain as roots because the code will reference them.
2:32:06
drmeister
I'm wondering if i have some fault in my thinking about how symbols turn over - in Clasp the multithreading code for symbols gets sketchy if symbols are bound and then collected.
2:34:13
Bike
practically speaking, nearly all dynamic variable names will be referenced from packages, so those are fine
2:37:43
Bike
anyway that doesn't seem to be what's happening here given that the symbols are things like *BASIC-BLOCKS* that aren't being collected
2:39:56
drmeister
I'm thinking more that the machinery for making symbol bindings thread safe isn't itself thread-safe.
2:40:53
drmeister
Since _Bindings wasn't atomic it was possible that it could be set by two threads in a race condition - that would lead to weird errors not unlike we are seeing.
2:41:49
drmeister
it's by no means clear that this is the problem but compile-file-parallel could expose a problem because it's a bunch of threads that startup and end at about the same time.
2:45:04
drmeister
The _Binding slot stores a size_t variable that stores an index into a thread local vector that stores bindings for symbols
2:46:01
drmeister
IIRC the first time a thread binds a dynamic variable the _Binding slot is set and that is the symbols index for life.
2:47:34
drmeister
The errors that we have been seeing when compile-file-parallel fails are this variable is unbound or that variable is unbound IIRC
2:49:24
drmeister
I've changed the _Binding to an atomic variable and I'm going to put in a print statement that will print if a race condition is ever detected.
2:50:02
drmeister
I'll have it do the right thing - but print a message - then we can watch for that message.
3:24:27
drmeister
I printed the _Binding of symbols as they are assigned - and it's very possible that there is a race condition...
3:29:19
drmeister
Once the compiler starts somewhere around line 40 there are many more symbols bound.
3:31:50
drmeister
Here I printed the thread id - there are threads racing to bind symbols at the same time.
3:43:47
drmeister
! WARN WARN WARN !!!! ANOTHER THREAD SET A SYMBOL _Binding slot to 79 before we could set ours to 80!!!!
3:45:03
drmeister
! WARN WARN WARN !!!! ANOTHER THREAD SET A SYMBOL _Binding slot to 122 before we could set ours to 123!!!!
6:42:41
drmeister
Everyone - I pushed to the 'work' branch a version of clasp that I think fixes this very random problem with compile-file-parallel.