freenode/#clasp - IRC Chatlog
Search
19:41:17
drmeister
So when you do this loop of compilation with the old discriminate macro vs the new one - what is the difference in time?
19:42:30
drmeister
Oh - that's not as big a difference as I expected given the difference in build time.
19:43:07
Bike
and i still don't know that it's actually what's causing the build slowdown, that's just my guess.
19:45:31
drmeister
https://usercontent.irccloud-cdn.com/file/A0XLcfXc/out-new-method-33994.svg https://usercontent.irccloud-cdn.com/file/XuUKCuIT/out-old-method-33994.svg
19:50:41
drmeister
Can you push this to a branch and give me instructions on how to turn it on one way or the other? I want to do some timing.
19:51:22
drmeister
I expect that the C++ and aclasp and bclasp compilation times should be largely unchanged. Definitely C++ and aclasp.
19:52:05
drmeister
Right - so I want to find the file and then maybe the forms. Something is going crazy.
19:56:36
Bike
then you can write for instance (lambda (x) (clos::discriminate2 (x) nil (((#.(find-class 'standard-generic-function)) (#.(find-class 'cleavir-ir:enter-instruction))) t)))))
19:56:54
Bike
which is a function that will return T if given a direct instance of standard-generic-function or cleavir-ir:enter-instruction, or else nil
20:00:27
Bike
i just told you. you load this file and then you have the discriminate2 macro, which is the discrimination part of discriminating functions.
20:05:44
selwyn
unfortunately, on trying to do demo-clasp-cxx-interop, i got lots of inconsistent missing override warnings and a simple-program-error JIT session error: Symbols not found: { __emutls_v.my_thread_low_level, _ZTH19my_thread_low_level } https://www.irccloud.com/pastebin/ZlAMlhxr/
20:06:02
Bike
man, the compile times are really not good though. compiling the generated discriminating function for cleavir-ir:successors takes a little under half a second either way
20:07:08
drmeister
So if I substitute discriminate2 in this function - then it will use discriminate2
20:07:09
drmeister
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/discriminate.lsp#L435
20:10:10
Bike
you'll get slightly different code since i tried changing some things in case, but it doesn't matter for a relative comparison
20:13:17
Bike
my first hypothesis for this slowdown was that i screwed things up and there are a lot of dispatch misses, but things seem to work for most classes. cxx classes i'm not sure about.
20:16:57
scymtym
i may have code for loading the dtrace output into it, but for this specific use-case, loading an svg into a browser is probably easier since it avoids the need to get McCLIM up and running
20:18:07
scymtym
but writing a Clasp backend for deterministic and statistical profiling might be a good idea at some point
20:43:25
scymtym
kpoeck: i don't think i understand the question. there is no code for Clasp at the moment
20:45:09
Bike
having tabular output would be nice. then i could just get data from the repl. wouldn't be the same thing as a flame graph, but that's ok.
20:45:51
scymtym
drmeister: this is the backend i wrote for SBCL: https://github.com/scymtym/clim.flamegraph/blob/advice-backend/src/backend/sb-sprof/source.lisp#L42 the interrupt handler is at the end
20:52:03
drmeister
I need to write something to analyze the text output from a build and compare compilation times of every file.
21:54:32
karlosz
OK, i have a preliminary table of the behavior of how llvm optimizations affect full build order: https://paste.gnome.org/phoovrkb3
21:55:18
karlosz
it seems like the only real measurable difference is not doing optimize module at all, otherwise, as long as at least the O2 optimization pipeline is used, you get similar processing times in llvm
21:55:33
karlosz
from looking at the flamegraph, it's pretty clear that most of the time is still dominated by cleavir, particularly cst->ast
21:56:49
karlosz
i think with the new PM it should be much easier to write a custom pipeline for clasp like other compilers based on LLVM do, where you choose the optimizations that make the most sense, i.e., best compile-time for generated code speed tradeoff
21:59:19
karlosz
but not always, maybe only subject to the compile-time flag in wscript, as far as i can tell
22:00:28
karlosz
i was doing it on bidmac so if anyone else was building at the same time, it may have affected things
22:03:37
karlosz
right, but i certainly don't think it makes things much faster in the current setup
22:13:49
drmeister
Bike: I don't understand why I have this error: (GENERIC-FUNCTION-CALL-HISTORY GF) is not a supported place to CAS
22:48:31
karlosz
this graph shows the compilation of cleavir/covert-special.lisp with the new PM: ocf.io/~karlos/flame12.svg
22:48:51
karlosz
this explains why switching pass managers for optimizeModule doesn't seem to make much of a difference
22:53:08
karlosz
aha, i see a way to improve the speed from looking at the graph. IR module optimizations already include IR function optimizations, so doing both is just redundant
22:57:41
karlosz
well, i'm not sure if the current way in jit-setup is actually doing it twice (it does populate both the function pipeline and the module pipeline), but doing so in the new PM is redundant
22:58:12
drmeister
Another thing that I've thought about is cleavir's map-instructions does a lot of consing of hash-tables. Could we estimate the number of instructions and set the initial size of the hash-table used to do map-instructions?
23:00:15
karlosz
we can cache the number of instructions in the graph somewhere as an approximation of the initial size, if the hash table resizing is actually a bottleneck
23:16:35
Bike
what should be happening is that with-early-accessors macrolets generic-function-call-history, since it's a slot in standard-generic-function
23:17:05
Bike
but in that backtrace it seems like there's no generic-function-call-history macro, because if there was it wouldn't be calling default-cas-expansion
23:19:27
drmeister
Where should the generic-function-call-history macro be defined? I don't see it anywhere. In my code or yours.
23:20:51
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/hierarchy.lsp#L146 here's the slot.
23:21:17
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/std-slot-value.lsp#L66-L81 here's with-early-accessors, which as you can see should expand into a macrolet.
23:39:45
drmeister
I'm having trouble with ediff-directories as well - it says there are differences between directories but the files within them are all the same.
23:46:49
Bike
like does it not know about the with-early-accessors macro, so it's compiling it as a function call?
23:48:18
Bike
because that would explain the problem, though i don't know how it would miss that macro.
0:13:32
drmeister
So a test discriminator compile takes a little bit longer and the overall build time is unchanged
0:14:18
drmeister
Maybe the thing is going nuts when we turn off the interpreter and it goes more nuts if the discriminating function compiler is even a little slower.
0:14:55
Bike
but you're seeing that the build time is unchanged if you keep this in but keep the interpreter on? that's not what i just saw
0:16:38
Bike
yeah. normal build = like nineteen minutes. build with interpreter still on but this discriminator code = like 23 minutes
0:18:25
drmeister
I've been watching this guy "kitboga" on twitch. He calls internet scammers and strings them along for as long as possible.
0:43:45
drmeister
Here's the sorted list of the files that take the longest time to build across aclasp/bclasp/cclasp - explanation follows...
0:45:13
drmeister
The numbers on each line are... <pid> <seconds-to-build> <bytes-consed> <total-num-files-aclasp/bclasp/cclasp-58/141/473> <file-name>
0:49:47
drmeister
cl-wrappers.lisp takes almost 2x longer. We should be able to pick that up with a flame graph.
0:55:48
drmeister
Discriminator functions will get compiled as the interpreted discriminator functions age out.
0:56:48
Bike
https://github.com/clasp-developers/clasp/blob/master/src/core/funcallableInstance.cc#L371-L376
0:58:56
Bike
it seems like the improvement in the speed of discrimination is generally dwarfed by the compile time.
1:01:34
drmeister
If it's all interpreted generic functions and satiated ones - then the satiated functions must be slower than the old satiated functions.
1:01:36
Bike
since it's turned off in cleavir, i think this is the only place the compiler is used https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/satiation.lsp#L451-L618
1:02:03
Bike
it's hard to imagine these clos functions being so much slower that it doubles compile time
1:02:17
Bike
especially when in simple comparisons it doesn't look like the new code generated is in fact slower
1:03:10
Bike
like i said, my first suspicion is that something is dispatch missing a lot, but it's hard to tell
1:03:15
drmeister
How about we count the number of gf discriminating functions being compiled and report that with Time and consed in the build output.
1:08:40
drmeister
Here are the slowest ones - they look like the first ones I posted - with the old discriminating function compiler.
1:09:34
drmeister
cracauer told me that when he used to do timing at Google that he took a lot of trouble to get the machines stable for timing measurements.
1:37:12
Bike
anyway, if this is legit, cool i guess. i can do this for the dtree interpreter and then where tags are unneeded. and i can work on something else.
1:58:23
drmeister
The llvm-ir for Instance_O vs FuncallableInstance_O looks identical - is the rack at the same offset at the moment?
1:59:41
Bike
but that could change, or we coudl use the same thing for derivables where the rack is at a different offset.
2:01:01
Bike
splitting it up also means that we do one 3-switch on the header and then a 2- or 3- switch on the rack stamp, rather than a switch on the where tag followed by a 5-switch on the rack stamp.
2:07:14
Bike
should do some metrics on the average spread, at some point when i'm less sick of this code
2:09:18
Bike
i gues a histogram of call history lengths would probably be a good start and easy to do
2:22:46
Bike
bclasp generally. if you put it in a file and you're in cclasp it'll be cleavir, though.
2:24:24
Bike
actually it might be possible to do so now since dispatch misses don't invalidate the function, but i dunno, i don't like going down that rabbit hole
2:27:06
Bike
cleavir-ir:successors is a function i use for this sometimes, since it's inevitably got a call history that's like 60 long
2:28:22
Bike
i usually pick out the form and call macroexpand myself. like (macroexpand-1 (third (fourth (third (third (fourth form)))))) in this case
2:28:53
Bike
like i said. just put #.(clos::generate-discriminator whatever) in an otherwise empty file and compile that.
2:29:29
drmeister
You said "quickest way would probably be #.(clos::generate-discriminator #'generic-function)"
2:29:58
drmeister
The quickest way would probably be to put #.(clos::generate-discriminator #'generic-function) into a file and compile-file it.
2:31:32
drmeister
quickest way would probably be #.(clos::generate-discriminator #'generic-function)
2:33:50
Bike
uh, maybe do #.(clos::compile-time-discriminator #'initialize-instance (clos::generic-function-call-history #'initialize-instance))
2:39:45
Bike
your ir will have some fixnum tests it doesn't need to. i cut those out locally to see if they were the problem