freenode/#clasp - IRC Chatlog
Search
3:59:15
beach
drmeister: So why not try something really simple, use a multiplicative growth instead of trying to guess the initial size.
5:02:19
drmeister
I've switched clasp's hash tables to open addressing - I can control the size now.
5:09:53
drmeister
Anyway - I'm trying to track down why the child threads compiling AST->native code appear to slow down the main thread.
5:18:30
drmeister
I'm having a heck of a time trying to figure out why I can compile the AST's in 0.3 seconds but when I launch children that compile the AST's to native code the AST generation appears to take 10x longer.
6:16:25
drmeister
I'm getting some evidence that my contention problem is the class database - it has an upgradable read/write lock using two mutexes.
6:17:42
drmeister
In the compile-file-parallel - I put a loop around the hir transformations - so each child thread now runs my-hir-transformations (a bunch of hir transformations) 100x. The main thread slows down about 20x.
6:17:54
beach
I guess it can happen when lots of DEFCLASS forms or lots of DEFMETHOD forms are accessed.
6:18:55
drmeister
I sample the process and I see lots and lots of read locks being acquired for the class database. This is within map-instructions-xxx
6:19:28
beach
I have no answer in real time, but I will give it some thought. Maybe some insight could be had by realizing that (setf (find-class...) nil) would be rare, i.e. few things would ever be deleted from the class table.
6:20:29
beach
Monday mornings are crazy around here, and I need to leave. But I'll give it some thought during the day.
6:22:05
drmeister
This looks interesting - reading... https://en.wikipedia.org/wiki/Read-copy-update
6:24:02
drmeister
The other big red flag is now I've got 40 threads grinding out HIR transformations and the process never uses more than %150 CPU
6:34:57
beach
Typically, no classes are added to the database during that phase, so a single-writer-multiple-reader thing would work.
6:53:20
drmeister
I just ran an experiment - I launched 10 threads that all loop and call (find-call 'double-float).
6:58:20
drmeister
I have a multiple reader, single writer lock already - is that the "so a single-writer-multiple-reader thing would work."?
6:59:17
drmeister
i think so - according to this lecture the multiple-reader/single-writer solution is very limited.
7:01:40
drmeister
I'm doing some research here myself - this is a known problem. The linux kernel uses something called Read-Copy-Update to speed up reads at the expense of writes.
7:05:51
drmeister
https://github.com/Bike/SICL/blob/master/Code/Cleavir/HIR-transformations/eliminate-catches.lisp#L7
7:08:42
beach
Maybe one day we will implement typep with constant type descriptors as a generic function, but we haven't done that yet.
7:17:37
drmeister
https://github.com/Bike/SICL/blob/master/Code/Cleavir/Intermediate-representation/map-instructions.lisp#L32
7:21:19
drmeister
Weren't you and Bike talking about this a while ago? Replacing typep calls with predicates?
8:09:47
drmeister
Yeah - the read-many/write-one lock is totally inadequate here. Multiple CPU's trying to grab the read lock is bad. The memory that represents the read lock can only be held by one core at a time and so it bounces between the CPU's.
15:40:44
drmeister
After converting about a dozen invocations of TYPEP to xxxx-p predicates in the AST->HIR->native code, compile-file-parallel gets up to using 450% CPU.
15:45:27
beach
Also, like I said, TYPEP with a constant type descriptor can be implemented efficiently without calling FIND-CLASS.
15:47:25
beach
Secretly creating a generic function for some types, much like the ones you created manually.
15:48:13
beach
Bike: It appears to be the lock itself used in the traditional solution to multiple-reader/single-writer problems.
15:49:25
drmeister
Yeah. I watched a lecture on it last night. Acquiring a read lock involves writing to shared memory. That forces exclusive access on that memory for one CPU and invalidates caches in all the other CPU's.
15:49:32
beach
Right, but there must be a short lock to make sure there are no writers. That one appears to be the problem.
15:49:45
drmeister
So the exclusive access to the memory for the read lock ping-pongs around the different CPU's.
15:51:26
beach
Bike: You are right that there may be no control path from the first instruction of a TAGBODY to the last one, but if you don't include some special kind of edge, aren't we going to have problems with ownership calculations again?
15:52:32
drmeister
There something called Read-Copy-Update that they start to describe here: https://www.youtube.com/watch?v=BcAED2f3z0I
15:53:08
drmeister
It's used in the Linux kernel for this purpose - lots of readers, occasional updating and you want reading to be very fast.
15:53:09
Bike
beach: I mean say we have (progn (tagbody loop ... (go loop)) (foo ...)) - the foo call is never actually reached so we don't have to worry about who owns it cos it's deleted. but in (progn (tagbody ... end) (foo ...)) the end tag will just have the foo call as a normal successor. so everything should be fine.
15:53:28
drmeister
That lecture I just posted illustrates the problem with multiple-reader/writer locks.