freenode/#clasp - IRC Chatlog

3:58:44 beach Good morning everyone!

3:59:15 beach drmeister: So why not try something really simple, use a multiplicative growth instead of trying to guess the initial size.

5:01:27 drmeister Hi - I'm trying

5:02:19 drmeister I've switched clasp's hash tables to open addressing - I can control the size now.

5:06:22 beach Why could you not control the size when it was linked?

5:09:09 drmeister Hmm, you are right - I could of - I had the count of entries.

5:09:53 drmeister Anyway - I'm trying to track down why the child threads compiling AST->native code appear to slow down the main thread.

5:18:30 drmeister I'm having a heck of a time trying to figure out why I can compile the AST's in 0.3 seconds but when I launch children that compile the AST's to native code the AST generation appears to take 10x longer.

6:15:21 drmeister beach: How would you implement a find-class for multi-threading?

6:16:25 drmeister I'm getting some evidence that my contention problem is the class database - it has an upgradable read/write lock using two mutexes.

6:16:30 beach I guess just lock the table of classes in the environment.

6:16:40 beach Oh, I see.

6:17:12 beach Let me think. I hadn't imagined it would be so heavily accessed.

6:17:42 drmeister In the compile-file-parallel - I put a loop around the hir transformations - so each child thread now runs my-hir-transformations (a bunch of hir transformations) 100x. The main thread slows down about 20x.

6:17:54 beach I guess it can happen when lots of DEFCLASS forms or lots of DEFMETHOD forms are accessed.

6:18:55 drmeister I sample the process and I see lots and lots of read locks being acquired for the class database. This is within map-instructions-xxx

6:19:28 beach I have no answer in real time, but I will give it some thought. Maybe some insight could be had by realizing that (setf (find-class...) nil) would be rare, i.e. few things would ever be deleted from the class table.

6:20:18 drmeister https://usercontent.irccloud-cdn.com/file/PwKZSam8/image.png

6:20:29 beach Monday mornings are crazy around here, and I need to leave. But I'll give it some thought during the day.

6:21:26 drmeister No problem - I was asking because maybe you thought about this already.

6:21:45 beach No, but I will.

6:22:05 drmeister This looks interesting - reading... https://en.wikipedia.org/wiki/Read-copy-update

6:22:18 beach I never considered parallel compilations.

6:24:02 drmeister The other big red flag is now I've got 40 threads grinding out HIR transformations and the process never uses more than %150 CPU

6:28:02 drmeister And almost every thread is calling find-class

6:34:16 beach Wait as sec...

6:34:23 beach Is this AST-to-HIR?

6:34:57 beach Typically, no classes are added to the database during that phase, so a single-writer-multiple-reader thing would work.

6:52:26 drmeister It is AST-to-HIR - and no classes are added to the database at that phase.

6:53:20 drmeister I just ran an experiment - I launched 10 threads that all loop and call (find-call 'double-float).

6:53:27 drmeister It never goes over 133% CPU.

6:53:48 drmeister And sampling the stack - it looks exactly like compile-file-parallel.

6:54:31 drmeister https://usercontent.irccloud-cdn.com/file/Jx2grtA6/image.png

6:55:21 drmeister https://www.irccloud.com/pastebin/izpbgVjm/

6:57:23 beach So does my suggestion sound right then?

6:57:54 drmeister I'm watching a lecture on Read-Copy-Update.

6:58:20 drmeister I have a multiple reader, single writer lock already - is that the "so a single-writer-multiple-reader thing would work."?

6:58:35 beach I see.

6:58:50 beach So the lock itself is the problem then?

6:59:08 beach That doesn't sound right to me, unless your locks are very slow.

6:59:17 drmeister i think so - according to this lecture the multiple-reader/single-writer solution is very limited.

6:59:54 drmeister I'm using pthreads - I get what I get.

7:00:49 beach OK, I'll think about better solutions.

7:01:40 drmeister I'm doing some research here myself - this is a known problem. The linux kernel uses something called Read-Copy-Update to speed up reads at the expense of writes.

7:02:29 beach OK, good luck.

7:03:21 beach One thing to do is to figure out why find-class is called during compilation.

7:03:38 drmeister That I can answer...

7:03:38 beach I can't figure out why that would be the case.

7:03:54 beach OK.

7:04:19 drmeister https://usercontent.irccloud-cdn.com/file/QZfqj4Ul/image.png

7:05:11 beach I don't know what I am looking at.

7:05:51 drmeister https://github.com/Bike/SICL/blob/master/Code/Cleavir/HIR-transformations/eliminate-catches.lisp#L7

7:05:58 drmeister That's essentially a backtrace.

7:06:13 drmeister eliminate-catches calls TYPEP

7:06:53 beach So change that to invoke catch-instruction-p, defined as a generic function.

7:07:11 drmeister Good idea.

7:08:42 beach Maybe one day we will implement typep with constant type descriptors as a generic function, but we haven't done that yet.

7:17:36 drmeister There are more of these - I'm changing them to predicates...

7:17:37 drmeister https://github.com/Bike/SICL/blob/master/Code/Cleavir/Intermediate-representation/map-instructions.lisp#L32

7:17:48 drmeister Now I know what I'm looking for.

7:17:55 beach Great!

7:19:37 drmeister This is what I add - right?

7:19:39 drmeister https://www.irccloud.com/pastebin/YmdFJyBP/

7:19:47 drmeister Whoah - enclose-instruction-p

7:19:58 beach yes, -p

7:20:14 beach Or you can use the :method option for defgeneric here.

7:21:19 drmeister Weren't you and Bike talking about this a while ago? Replacing typep calls with predicates?

7:21:42 beach Yes, I think so. And doing it automatically.

7:21:51 beach Same idea as with MAKE-INSTANCE.

7:42:51 drmeister There are quite a few of these.

7:43:00 drmeister But they are easy to spot now.

8:09:47 drmeister Yeah - the read-many/write-one lock is totally inadequate here. Multiple CPU's trying to grab the read lock is bad. The memory that represents the read lock can only be held by one core at a time and so it bounces between the CPU's.

8:10:06 drmeister https://www.youtube.com/watch?v=BcAED2f3z0I

8:10:22 drmeister The interesting part starts at about 20 min

8:12:42 drmeister The problem gets worse the more CPU's there are fighting for the lock.

8:14:08 beach I see.