freenode/#clasp - IRC Chatlog
Search
15:40:44
drmeister
After converting about a dozen invocations of TYPEP to xxxx-p predicates in the AST->HIR->native code, compile-file-parallel gets up to using 450% CPU.
15:45:27
beach
Also, like I said, TYPEP with a constant type descriptor can be implemented efficiently without calling FIND-CLASS.
15:47:25
beach
Secretly creating a generic function for some types, much like the ones you created manually.
15:48:13
beach
Bike: It appears to be the lock itself used in the traditional solution to multiple-reader/single-writer problems.
15:49:25
drmeister
Yeah. I watched a lecture on it last night. Acquiring a read lock involves writing to shared memory. That forces exclusive access on that memory for one CPU and invalidates caches in all the other CPU's.
15:49:32
beach
Right, but there must be a short lock to make sure there are no writers. That one appears to be the problem.
15:49:45
drmeister
So the exclusive access to the memory for the read lock ping-pongs around the different CPU's.
15:51:26
beach
Bike: You are right that there may be no control path from the first instruction of a TAGBODY to the last one, but if you don't include some special kind of edge, aren't we going to have problems with ownership calculations again?
15:52:32
drmeister
There something called Read-Copy-Update that they start to describe here: https://www.youtube.com/watch?v=BcAED2f3z0I
15:53:08
drmeister
It's used in the Linux kernel for this purpose - lots of readers, occasional updating and you want reading to be very fast.
15:53:09
Bike
beach: I mean say we have (progn (tagbody loop ... (go loop)) (foo ...)) - the foo call is never actually reached so we don't have to worry about who owns it cos it's deleted. but in (progn (tagbody ... end) (foo ...)) the end tag will just have the foo call as a normal successor. so everything should be fine.
15:53:28
drmeister
That lecture I just posted illustrates the problem with multiple-reader/writer locks.
15:54:49
Bike
i'm just a little worried there are things treating them as equivalent ot the arguments. i guess i'll see.
15:55:04
beach
drmeister: Oh, is this another case of a context switch being prohibitively expensive in "modern" operating systems?
15:56:10
scymtym
drmeister: do you already know whether readers are slowed down by the overhead of the read/write lock mechanism or whether writers actually hold the lock for extended periods of time?
15:57:45
drmeister
It's from the macOS tool "Activity Monitor" - it's like a backtrace and trace together.
15:59:41
drmeister
Ah - well, it looks like i'm not going anywhere for a while - the trains (that come every 30 min) are 36 min late.
16:02:01
Bike_
by the way, the generic function typep thing beach mentioned is what we already do for defstruct, so you could look at defstruct.lsp to see how it works (it's pretty simple though)
16:03:15
Bike
maybe i should try my make-instance biznis again, that would mostly eliminate find-class in make-instance
16:05:00
drmeister
What's happening is the exclusive access to the read-lock for find-class is ping-ponging all over the place.
16:06:42
drmeister
The child threads in compile-file-parallel are doing (TYPEP x 'cleavir-class) all over the place - I figured it was the same problem.
16:07:37
drmeister
That lecture I posted - the second half explains where the problem comes from - it's the CPU's cache algorithm that makes the locks possible that causes the problem.
16:08:25
drmeister
So I started replacing (TYPEP x '<cleavir-class>) with (<cleavir-class>-p x) and adding <cleavir-class>-p predicates.
16:12:15
Bike_
well, that's what i mean, i thought the locks would actually allow parallel use, but they aren't
16:12:32
Bike_
and i suppose there's nothing we can do about it really since they're just system locks
16:12:48
scymtym
do you store classes in cells? if so, couldn't FIND-CLASS be transformed into (load-time-value (find-class-cell …))?
16:17:35
Bike
and we have (typep x 'foo) => (classp x (car (load-time-value (gethash 'foo *class-name-hash*)))) or so
16:18:18
scymtym
the association name -> cell would never change so it could be looked up at compile-time (is the idea at least)
16:19:25
Bike
isn't it also what you were doing in sicl for classes? i thought sicl did that for practically everything
16:23:08
beach
Bike: I don't think I have gotten that far in my thinking. I hadn't imagined that FIND-CLASS would be a performance bottleneck.
16:25:30
Bike
well, it would come up with anything doing find-class in parallel- compilation is just what we're doing heavily
16:26:06
scymtym
as an aside: http://www.radgametools.com/telemetry.htm may be interesting for diagnosing these kinds of performance issues (until our clim-based tools are ready). caveats: it's probably not cheap and i don't know the platform constraints
16:28:27
Bike
honestly i'm kind of glad the problem is fairly easy to understand - i was worried it was going to be some undocumented llvm internal
16:43:58
kpoeck_
There is nearly no clos in them, perhaps thats a pattern that plays nice with the current implementation
16:50:07
drmeister
The problem with compile-file-parallel is that the algorithms for optimizing HIR call find-class like crazy.
16:55:39
drmeister
The Activity Monitor on macOS has this "sample process" function that is really nice - but it has a terrible flaw.
16:56:15
drmeister
It took me until last night to finally figured out how to read it. It's a backtrace and a trace rolled into one.
16:57:32
drmeister
They take a bunch of backtraces and merge together the common parts to get that. The top part is the backtrace and the bottom part that zig-zags partial backtraces that who each time sample.
16:57:59
drmeister
The horrible flaw is when running Cleavir - the backtrace part can get really, really deep.
17:08:38
drmeister
But it has a <Save> option - so I can dump it to a text file and bring it up in emacs.
17:14:51
beach
Bike and I are discussing new stuff that might be make it possible to remove CALL-WITH-VARIABLE-BOUND.
17:19:12
drmeister
Here is the next lecture on operating systems that describes Read-Copy-Update: https://www.youtube.com/watch?v=Jkoc0iWQtr8
17:41:43
drmeister
frgo: Have you used Read-Copy-Update for lock free reading of a data structure that is updated very infrequently?
17:43:50
drmeister
It requires that updates to the data structure happen when all threads that could access that data structure is blocked or at a safe point.
17:47:53
drmeister
I'm puzzling over how to implement this for the class table - currently a hash-table.
17:49:41
drmeister
I have an idea. If I put an alist in front of the class-table that contains add/remove class instructions and search that first and then search the class-table.
17:50:03
drmeister
Then - if I could block all of the threads and use the add/remove class instructions to update the class-table.
17:50:56
drmeister
AND I keep a pointer to the alist/class-table in thread-local storage for each thread. So find-class would go through this pointer in thread-local storage.
17:55:17
drmeister
Question: Can I set things up so that all threads reach a safe-point and pause so that I can update the class-table, update all of the thread local storage pointers to the class-table and then allow the threads to continue on their way?
18:12:09
drmeister
It's something to do with a global atomic pointer to a linked list of set-class instructions followed by the class table.
18:13:51
drmeister
It's a bit long. To summarize - the compile-file-parallel was experiencing a lock contention for the read-lock for the class table.
18:14:57
drmeister
Have you ever used Read-Copy-Update? https://en.wikipedia.org/wiki/Read-copy-update . It sounds like it's easier to implement in operating systems than user land.
18:18:31
drmeister
I would continue to use a hash-table for the class-table - but first find-class would search a linked list of "set-class" instructions that would shadow anything in the class hash-table.
18:19:03
drmeister
I can atomically update a global variable to the linked list of "set-class" instructions using CAS.
18:19:46
drmeister
Occasionally I would want to update the class hash-table using the set-class instructions so that the list of set-class instructions doesn't get too long.
18:20:03
frgo
Hm - IIUC then this requires some sophisticated handling of sequencing the updates. Yeah, if this could be done by appending to a list ...
18:21:13
frgo
What I don't get is how we access the linked list by multiple threads in a lock-free way ..
18:21:23
drmeister
I think the updating of the class hash-table using the set-class list is the tricky part. I need to get all threads to a safe point and then do the update, then erase the set-class list and then tell every thread to use the class hash-table.
18:23:05
frgo
You'd need to interrupt a thread, re-orient the class access to the class hash-table, and the tell the thread to continue.
18:25:12
drmeister
Can I tell every thread - the next time you reach a safe-point pause and wait until all threads have reached a safe-point and then update the class-table and then proceed?
18:25:54
frgo
So, establishing a signal handler in each thread - this handler is the "I need to wait for class access reorientation and this is signaled by a condition variable."
18:27:11
drmeister
Is this a terrible idea? If there is one thread that is off doing something compute intensive and it doesn't reach a safe-point for an hour - everything will block for that time.
18:27:12
frgo
I can't imagine how one can "insert" a safepoint into normal C++ code. Except I manipulate at a higehr level, AST, IR, ...
18:29:04
drmeister
It's a call to this: https://github.com/clasp-developers/clasp/blob/dev/src/gctools/interrupt.cc#L225
18:30:41
drmeister
https://github.com/clasp-developers/clasp/blob/dev/include/clasp/gctools/threadlocal.h#L49
18:36:22
frgo
Coming back to your point on a thread running for an hour: That's why I think you would have to interrupt the threads by a signal. You can't just wait until they reach a safepoint.
18:38:27
frgo
Still - tricky. What if you have 10000 threads ... Delivering signals to hat nr of threads is not exactly a cheap/ fast oepration.
18:39:29
frgo
Fortunately we don't support Windows. We had a case where exactly this scenario caused Win10 to crash - very reliably so. Microsoft is working on it ...
18:43:56
drmeister
I have to ensure that a safe point call is made periodically. It’s the eternal problem of c++ exceptions and Unix signal handlers not working together
18:44:50
frgo
Exactly. I would shy away for this very reason. Too many unknowns to get this stable - tells me my experience.
18:46:09
drmeister
It would only come up when redefining classes. Then all the threads would have to sync to update the class table
18:47:39
drmeister
I have an interim solution. Replace typep and find-class calls in the compiler with predicate tests.
18:49:12
drmeister
Changing code and avoiding using TYPEP and FIND-CLASS in future code until we implement Read-Copy-Update for the class table.
18:50:11
drmeister
If you want to run parallel code - make sure it doesn't contain TYPEP or FIND-CLASS calls - and that it avoid reader/writer locks - they are useless.
18:51:18
drmeister
No - it's not attractive. The only attractive solution I see requires read-update-copy.
18:52:32
drmeister
That requires synchronizing threads and that requires judicious invocation of safe-point.
20:02:50
drmeister
::notify Bike - I'm replacing the use of *invocation-context* special variable with code that passes it as a second argument to introduce-invoke-aux here...
20:07:07
drmeister
scymtym: What was the deal with transforming FIND-CLASS into (load-time-value (find-class-cell ...)) ?
20:07:55
drmeister
Instead of storing class-symbol-name/class we store class-symbol-name/(CONS class nil) in the class table?
20:09:56
drmeister
Yeah - Bike said... "and we have (typep x 'foo) => (classp x (car (load-time-value (gethash 'foo *class-name-hash*)))) or so"
20:14:37
drmeister
I realized that RCU is even more complicated with the moving garbage collection of MPS.
20:15:20
drmeister
Any GETHASH can trigger a REHASH - which would also have to be deferred to the class-table synchronization time.
20:16:00
drmeister
This is because objects move in memory and if they move and they can't be found because the hashing algorithm has them at a different place in the hash-table - then that would normally force a REHASH.
20:16:43
drmeister
I know how to handle this but it would require dropping down to a linear search of the class-table until the class-table synchronization happens.
20:39:08
drmeister
::notify Bike I've changed how classes are stored in the class table. Now (typep x 'foo) --> (classp x (car (load-time-value (ext:find-class-cell 'foo))))
21:00:18
drmeister
I wonder what the consequences would be if I made every CONS cell have an atomic CAR and CDR.
21:26:19
drmeister
It's problematic to return a cons cell containing a class in the car. Unless the car is atomic there could be race conditions.
21:50:54
Colleen
Bike: drmeister said 1 hour, 48 minutes ago: - I'm replacing the use of *invocation-context* special variable with code that passes it as a second argument to introduce-invoke-aux here...
21:50:54
Colleen
Bike: drmeister said 1 hour, 47 minutes ago: Tell me what you think when you see this.
21:50:54
Colleen
Bike: drmeister said 1 hour, 11 minutes ago: I've changed how classes are stored in the class table. Now (typep x 'foo) --> (classp x (car (load-time-value (ext:find-class-cell 'foo))))
21:52:21
drmeister
I've saved my previous work and now I'm in a separate branch exploring making the car and cdr of cons cells atomic. Also Symbol_O::_GlobalValue is now atomic.
21:53:12
Bike
introduce invoke will be going away with the changes to hir, so dont sweat it too much
21:54:22
Bike
with the cells we would juzt need atomic accessors, it's ok if the regular accessors aren't, though i dont' know how it goes on the C++ ends
21:56:38
Bike
http://www.sbcl.org/manual/#index-compare_002dand_002dswap has a list of places sbcl can access atomically or through cas
21:58:29
drmeister
That sbcl thing is all very well and good - but in C++ land I think I need to declare the types as atomic - then the compiler should make sure no funny business happens with them.
21:58:45
drmeister
I don't think I can use atomic operations on non-atomic data - I think that would be dangerous.
22:00:18
Shinmera
That reminds me of another thing that's vastly complicated by not having a GC: lockless datastructures
22:01:07
Shinmera
I mean, it can get pretty dicey even with a GC, but trying to read some papers on how to do the same in C++ without a GC made me think a noose would be a better option