freenode/#clasp - IRC Chatlog
Search
3:35:38
Bike
i'm specifically thinking i know the ins and outs of booting well enough that i can try the make-instance thing again
4:05:24
drmeister
C++ifying map-instructions-xxxx shaving 25 min off the build suggests the value of tightening up the code
4:07:25
Bike
i thought the point of your C++ified code was using slots in the instructions rather than allocating hash tables.
4:08:22
drmeister
It's two things - adding a 'touched' slot in the instructions and reducing the overhead of map-instructions-xxx
4:10:20
Bike
the only thing in map-instructions-arbitrary-order that seems like it could have a type check elided is the car in (pop instructions-to-process)
4:11:09
Bike
the compiler doesn't use much arithmetic or vectors, which are things where removing type checks would really help. if type inference improves compiler efficiency, it's because it would reduce the amount of code to be compiled
4:11:56
drmeister
You may be right - we can compare the llvm-ir for the C++ified versions and the Common Lisp versions.
4:14:51
Bike
whereas with the make instance thing, i know we have shitloads of make-instance calls and if i do it right they'll skip most of the work they do currently
4:18:21
drmeister
I've added karlosz changes on top of mine. Building -- I'll have new timing tomorrow. But this may bring us to parity with the AST compiler
4:22:54
Bike
i can probably also do a satiation thing where the ctors are in an initial state where they use functions compiled beforehand
4:29:20
Bike
the former of which has a variable class, so there's not a lot that can be done, which is even explained in a comment
4:29:44
drmeister
::notify kpoeck In the cl-bench tests - is there a way to get a spreadsheet view? It's difficult to scroll around. It would be a good idea to run it a few times to ensure all generic function discriminators are compiled.
4:30:50
drmeister
No - every generic function has its own counter - cleavir triggers them in batches
4:32:43
drmeister
it will pause in the middle a couple of times. If you do it again it will pause less and again - not at all.
4:33:48
drmeister
There is a lot to be said for beach's suggestion to put a somewhat random timer in there.
4:34:13
Bike
should put in a mode that's like the fastgf log except all it does is dump something when there's a compile
4:37:52
Bike
literally all you have to do to add a satiation case is add a list of the actual classes to the satiation form in satiation.lisp
4:38:40
drmeister
That would be very attractive - an automated way to set up satiation so that startup is faster.
4:39:06
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/cleavir/satiation.lisp not really automatable, since i do a ton of macrology in this file
4:41:47
Bike
but i mean, i guess? satiation overwrites the funcallable instance function, so the interpreter is unceremoniously dumped
4:42:33
Bike
last time i tuned up satiation the only calls that were compiling were like, some stuff in eclector
4:42:55
drmeister
Well, I love the idea of generating satiation code automatically by running clasp without it and then using the input for another build.
4:43:44
Bike
i mentioned it more as a cute idea than as something we should actually do. the main problem with it is that starting the system doesn't actually stress cleavir enough
4:44:18
Bike
you want to satiate the whole compiler, not just what we happen to use, or else the user will see random long compiles
4:44:53
drmeister
I'm building now with karlosz's latest changes. Then tomorrow I'll switch back to CL map-instructions-xxx with the 'touched' slot in instructions and see how that impacts things.
4:45:43
karlosz
if i remember correctly, beach would even be okay with adding a touched slot if it were done with stealth mix-ins
4:47:39
Bike
anyway, if we take a look at what dispatch misses are happening we can evaluate things better.
4:48:09
Bike
oh and a totally unrelated thing, i realized during brahms that if we do the multiple entry points fastcall thing we can use that to do tail calls.
8:21:14
Colleen
kpoeck: drmeister said 3 hours, 51 minutes ago: In the cl-bench tests - is there a way to get a spreadsheet view? It's difficult to scroll around. It would be a good idea to run it a few times to ensure all generic function discriminators are compiled.
8:22:08
kpoeck
karlosz: https://github.com/robert-strandh/SICL/pull/131 make the cst build another hour faster for me
8:22:33
kpoeck
My timings are Compilation finished in 5:24:51, Compilation finished in 4:03:12, Compilation finished in 2:58:46
8:23:13
kpoeck
First number baseline, second number your first pull request, third number your second pull request
8:23:40
karlosz
just need to run another build and ill ask you all to help me benchmark again, probably
8:45:48
kpoeck
I might no be getting this right, code for read is in https://github.com/clasp-developers/clasp/blob/dev/src/core/hashTable.cc#L625
8:46:03
beach
Since hash tables are used so much in Cleavir, it might be worthwhile giving some thought to their implementation in Clasp.
9:23:32
karlosz
but first i should run a build overnight for the build-function-dag removal and go to sleep
9:25:20
kpoeck
In my microbenchmark for hash-tables (only read), ccl 96 seconds, sbcl 116 seconds, clasp 113 seconds
10:18:05
kpoeck
Add a gethash for a non-existing key. Times between ccl, clasp and sbcl are pretty similar
10:18:27
stassats
kpoeck: what arguments are you using? because i can't repeat the same relative timings
10:18:44
kpoeck
Had to sum-up the result of the gethash, if not sbcl seems to optimize the gethash away
13:09:50
kpoeck
Looking at the latest flamegraph for cl-bench (compiler) the widest blocks seem to be map-instructions-with-owner and map-instructions-arbitrary-order with 20% and 15%
13:11:48
kpoeck
I wonder whether this is faster with generic functions for the 2 instructions instead of the typep
13:23:25
kpoeck
drmeister: does generating the flamegraph slows the execution substantially down or is that just noise?
13:35:53
beach
drmeister: Can you explain in a few short phrases how Clasp hash tables are implemented?
13:36:52
beach
drmeister: I was just thinking, if that implementation can be improved, there might not be any need to avoid using hash tables in the compiler, and Clasp would be faster in general as well.
13:58:13
drmeister
kpoeck: I don't know if dtrace slows down the program that you are tracing - I think it's designed to minimize that as much as possible.
14:07:56
drmeister
beach: I have tried in the map-instructions-xxx functions to add :rehash-size 4.0 to the make-hash-table and that helps - about 10%
14:08:53
drmeister
General: I suspect that there is a problem with generic functions and multi-threading - if anyone sees hangs when building asdf - please tell me.
14:09:51
drmeister
I'm building with the serial compile-file - and I'm looking at set-funcallable-instance-function - I think it needs a per-generic function spin-lock.
14:11:41
kpoeck
Is that the thing with (find ',slot-name (clos:class-slots ,class) :key #'clos:slot-definition-name)
14:15:55
drmeister
Bike: I'll try it in about an hour. I'm building with compile-file-serial to see if that doesn't hang.
14:17:58
Bike
i know i said s-f-i-f needs a lock before, but i've been thinking about it and i can't come up with a way that it'll go wrong without one
14:18:29
Bike
it always sets the GFUN_DISPATCHER to something valid. it sets the entry point to either funcallable_entry_point (which is valid if the GFUN_DISPATCHER is valid), or to some other entry point but it won't do that if the entry point needs the GFUN_DISPATCHER.
14:18:53
drmeister
It hung twice now in DtreeInterpreter_O::entry_point and the second time it was definitely looking up the dtree of the interpreter
14:19:07
Bike
you might end up with a case where something is going through funcallable_entry_point that doesn't have to, but that won't actually be a break...
14:20:18
Bike
we should probably take out the special dtree interpreter o thing, too. since it's just a closure.
14:27:45
kpoeck
drmeister: there are about 140 CL_DEFUN T_mv in the codebase but a lot of them don't seem to actually return multiple-values, but just (Values(single-value))
14:28:26
Bike
it could end up in a situation where the function and the entry point are out of sync, but the only operational effect is get-funcallable-instance-function returning a function that's not actually being used, and who cares about that
14:40:46
beach
drmeister: How about instead of a vector of CONS cells, you just make every other element a key and every other element a value?
15:04:30
Bike
https://github.com/clasp-developers/clasp/blob/master/src/core/hashTable.cc#L736-L739 am i reading this right that if it needs to rehash it actually prints something??
15:05:40
drmeister
Bike: That only happens if the hash-table runs out of slots - that should never happen.
15:07:42
drmeister
I put that in there when I was fixing the rehash-thresholds throughout clasp so that they were all less than 1.0 - then I just left it in.
15:09:51
drmeister
There is this... https://github.com/clasp-developers/clasp/blob/dev/src/core/hashTable.cc#L338
15:09:53
Bike
i think we're sort of ignoring some of the finer aspects of rehashing control, but we're allowed to so whatever
15:10:23
Bike
strictly speaking it could be a type error but we don't have to and it's kind of unlikely
15:10:34
beach
Other than the two elements represented in the vector itself rather than as a CONS, there is no difference.
15:12:37
Bike
not hasing twice would be good too, but hashing is just arithmetic so it probably doesn't matter relatively speaking
15:15:26
drmeister
https://github.com/clasp-developers/clasp/blob/dev/include/clasp/core/hashTable.h#L87
15:16:35
drmeister
The actual table is a Vec0<Cons_O> - it's not a simple-vector of Cons_sp cells like I said earlier. I forgot I made it a vector of Cons_O cells - so there is no allocation unless there is a rehash.
15:20:57
Bike
we still have a few issues like "[x] conses" and having an easy way to check that would be good
15:22:32
drmeister
(time (loop for x below 1000 do (cons 'a 'b))) -> Time real(0.000 secs) run(0.000 secs) consed(16000 bytes)
15:22:48
drmeister
(time (loop for x below 10000 do (cons 'a 'b))) . -> Time real(0.000 secs) run(0.000 secs) consed(160000 bytes)