freenode/#clasp - IRC Chatlog
Search
14:25:42
Bike
so i wrote the code so that discrimination works like i've been talking about, not using where tags. it's slightly slower if the generic function call history includes both instances and funcallable instances, but otherwise is faster, and by a higher percentage
14:26:07
Bike
however, it takes like 150% the time to compile, and if i force clos to always compile that means practically speaking a 50% increase in build time
14:53:49
Bike
well, i'll try it with the dtree interpreter, which is what we're actually using anyway. hopefully shouldn't matter since interpreted etc
16:06:58
Bike
it kind of goes both ways, since it takes more \time to start the image and start compiling
16:16:22
drmeister
We do the binary search - can we reduce it to a jump table or whatever llvm switch compilation is doing?
16:16:46
drmeister
llvm needs to solve the general case. We know a lot about what we need to compile.
16:19:27
Bike
the overall goal here is just to make the discriminating function code as fast as possible so we can use it for any kind of type differentiation. subgoal is using llvm switches and, necessarily, making stamps contiguous so llvm can merge them.
16:19:35
drmeister
What does "generic function call history includes both instances and funcallable instances" mean? When do we include instances and funcallable instances?
16:20:25
Bike
like if you have a function that's called with both a generic function and a HIR instruction as n argument
16:22:34
drmeister
I'm with you on making GF dispatch as fast as possible so we can use it for type differentiation
16:30:23
Bike
i'm also not totally sure this compile time thing is actually why build time is slower. i'm guessing based on smaller metrics
16:30:30
drmeister
Can you dump what a GF dispatcher looks like - the llvm-IR for a general example?
16:32:18
Bike
i moved to using llvm switches instead of building our own binary search since hey, llvm should be specialized for that, but it's super slow i guess? i don't understand what it could possibly be doing that's slower than our own building up a search tree, our process conses a ton
16:36:45
Bike
http://ix.io/2nju here's the disassembly for a function that returns T if it's given an instance of one of a couple disparate types, and otherwise nil
16:37:34
Bike
something like (lambda (x) (typep x '(or cons core:hash-table-eq standard-generic-function static-gfs::constructor-cell standard-method-standard-class clos:standard-effective-slot-definition)))
16:39:44
Bike
first it checks for the cons tag, then the general tag, then it checks the header against Instance_O, FuncallableInstance_O, and HashTableEq_O, and then it checks against the class stamps in the first two cases.
16:55:18
Bike
actually, let me double check, i might still have that off since your closurette thing
16:56:46
drmeister
I've added a lot of complicated stupid code over the years to try and dump the JITted modules.
16:57:05
drmeister
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/cmp/jit-setup.lsp#L731
16:58:44
Bike
also if we sorted the cases that would just mean we spend time on sorting instead of llvm, so i doubt that would make an impact
16:58:44
drmeister
I was about to say that we should figure out an API to capture JITted modules so we can examine them.
16:59:09
drmeister
Ideally DISASSEMBLE would give us a module that is identical to what gets compiled.
16:59:55
Bike
really, that's why i don't get the extreme impact on build time, the effect on compile time doesn't even seem that bad. maybe it gets exponentially worse with more cases, i didn't look too closely
17:00:05
drmeister
I have this stupid (quick-module-dump <whatever>) and I've forgotten how to use it.
17:00:15
Bike
i could just compile-file this instead of you want. i don't want to go down some jit based rabbit hole.
17:02:35
Bike
i've broken things up so that we can just use a "discriminate" macro. since that's what we want for type tests. so there's no methods involved, it just returns t or nil.
17:02:39
selwyn
what's the status of demo-clasp-cxx-interop on linux? has quickclasp been working ok?
17:02:41
drmeister
We should also profile the compilation of discriminating functions and see where the time is being spent.
17:04:32
kpoeck
I can confirm that demo-clasp-cxx on macosx even runs on machines not installed by drmeister :-)
17:04:44
drmeister
I did notice that usocket stopped working because it couldn't find the 'sb-bsd-socket' system. Clasp provides it internally - so it's something that I "solved" by adding...
17:04:46
drmeister
(asdf:register-immutable-system :sb-bsd-sockets) ; already provided by the system
17:05:48
drmeister
Right now demo-clasp-cxx is kind of a hack. With the coming JITLink it will be better.
17:06:27
drmeister
Currently the problem is you can only load the binding code once. You have to shutdown clasp and start it again if you want to reload.
17:17:08
drmeister
Yeah - it's because static initializers don't work yet on Linux. I'm told they will soon.
17:17:57
drmeister
Without static initializers I am forced to expose an external linkage symbol to initialize the bindings and the external symbol needs to be hard coded into the Lisp code that loads the bindings.
17:18:58
drmeister
With static initializers everything can be done with internal binding symbols and there is no external symbols and no danger of symbol collisions.
17:20:27
drmeister
Hmm, so what's wrong with the DISASSEMBLE output then? When I delete the offending line from the IR (it's not referenced anywhere else) the error moves to the next line.
17:20:46
drmeister
That makes me suspect the problem isn't the line that it says it is. I don't see anything wrong with the line either.
17:21:13
drmeister
I've deleted four globals now (they are not referenced from the code - so it's ok to do that). Now I get...
17:22:37
drmeister
Side note/question: I think someone told me what these were before but I forgot - what are these?
17:24:44
drmeister
Let's ponder this for a moment... https://usercontent.irccloud-cdn.com/file/yKr1SfA4/TEST%5ECOMMON-LISP-USER%5EFN%5E%5E.pdf
17:25:31
drmeister
1. Ok, dumping register arguments - we should do something about that under control of DECLARE.
17:28:16
drmeister
kpoeck: Did you do anything to tell ASDF about sb-bsd-sockets? Clasp puts it in *modules* but I still had problems with it being missing until I registered it with asdf as an immutable system.
17:32:02
Bike
this is post optimization, i think, since we don't actually generate a switch for the tag
17:37:24
Bike
that code is the same used by the existing discriminating function thing, so it's not the problem
17:38:34
drmeister
I'm thinking it's more llvm-IR instructions that need to be compiled. The generated code hopefully ends up as a single displaced read but we could eliminate two llvm-IR instructions if you use a GEP-9
17:39:33
Bike
that's what the core::header-stamp special operator is. irc-read-tagged-general-header
17:40:00
Bike
if i force fastgf to always use the compiler with the discriminating function code in master, build takes about 23 minutes
17:40:26
Bike
the part you're talking about is the same before and after, so i'm not really concerned with it right now
17:41:38
drmeister
I understand that it's the same before and after - I mean for better performance all around - let's change irc-header-stamp to use a GEP-9.
17:42:18
drmeister
Can you compile discriminating functions the old way and the new way in a loop and time it?
17:43:14
drmeister
Because if it is then it's an even bigger difference because half the time is spent compiling C++.
17:44:08
drmeister
Get a branch built that uses the old approach. Then compile some discriminators in a loop and let's profile it.
17:44:15
Bike
compiling something that checks (or standard-generic-function cleavir-ir:enter-instruction) with the master code 100 times takes, say, 1.8s
17:47:46
drmeister
I can do it from the console if you like. Just put it in two separate directories and I'll profile and show the results.
17:48:30
drmeister
The only challenge for you is getting the extra terminals open and then displaying the svg files.
17:58:57
drmeister
selwyn: What do I do to remove a system from quickclasp. edit something something run something something?
18:01:10
selwyn
and then cd /home/selwyn && ./rebuild-dist to make a new version of the distribution
18:04:26
selwyn
executing it as sudo may not load the correct .sbclrc file which is where i usually load quicklisp
18:06:21
selwyn
http://thirdlaw.tech/quickclasp/quickclasp.txt will always contain information about the most recent dist
18:08:29
selwyn
and then, (ql:update-all-dists) on the client to pull the latest distribution, it won't update automatically
18:19:56
Bike
alright compile times are different now but the one is still slower, so i guess it's fine
18:29:04
drmeister
Can you dump the long backtrace? It's a brittle tool with multiple failure modes.
18:31:47
Bike
you wanna do it? i can just set it running and you can do it. i can't view svgs anyway.
18:32:43
drmeister
Do you have this in your wscript.conf? DEBUG_OPTIONS = [... "DEBUG_JIT_LOG_SYMBOLS", ...]
18:43:41
drmeister
You said "alright compile times are different now but the one is still slower, so i guess it's fine" - I interpreted that as the difference in speed was less noticible.
19:06:42
kpoeck
decribed in https://github.com/clasp-developers/clasp/wiki/Using-the-profiling-tools
19:07:33
Bike
still don't really have any way to view it. i could scp it back but there's annoying to do. hm
19:12:37
Bike
cannot see shit. maybe someday i'll be able to use a computer effectively, but not today
19:17:58
drmeister
If it didn't cut off the tips it would take forever to render. It's been a problem profiling the compiler.
19:22:58
drmeister
Bike: Switch to https://github.com/clasp-developers/FlameGraph.git for FlameGraph - I tweaked it for clasp.
19:31:25
drmeister
I think the black comes because I tell it to use the 'clasp' color scheme and Brendon's doesn't recognize it.
19:32:46
drmeister
Huh - I hard coded it - it's in do-flame. - can you change this line and increase 400 to like 1000
19:33:16
drmeister
It should be less of a problem now that we got rid of call-with-variable-bound and funwind-protect
19:34:36
Bike
i kinda don't want to scp it back to my machine though, could you throw it up somewhere? it's the same filename
19:36:23
drmeister
I mean the new discriminating function code or the old discriminating function code
19:37:40
drmeister
Then we can compare how much time is spent doing different things in the old and new code.
19:41:17
drmeister
So when you do this loop of compilation with the old discriminate macro vs the new one - what is the difference in time?
19:42:30
drmeister
Oh - that's not as big a difference as I expected given the difference in build time.
19:43:07
Bike
and i still don't know that it's actually what's causing the build slowdown, that's just my guess.
19:45:31
drmeister
https://usercontent.irccloud-cdn.com/file/A0XLcfXc/out-new-method-33994.svg https://usercontent.irccloud-cdn.com/file/XuUKCuIT/out-old-method-33994.svg
19:50:41
drmeister
Can you push this to a branch and give me instructions on how to turn it on one way or the other? I want to do some timing.
19:51:22
drmeister
I expect that the C++ and aclasp and bclasp compilation times should be largely unchanged. Definitely C++ and aclasp.
19:52:05
drmeister
Right - so I want to find the file and then maybe the forms. Something is going crazy.
19:56:36
Bike
then you can write for instance (lambda (x) (clos::discriminate2 (x) nil (((#.(find-class 'standard-generic-function)) (#.(find-class 'cleavir-ir:enter-instruction))) t)))))
19:56:54
Bike
which is a function that will return T if given a direct instance of standard-generic-function or cleavir-ir:enter-instruction, or else nil
20:00:27
Bike
i just told you. you load this file and then you have the discriminate2 macro, which is the discrimination part of discriminating functions.
20:05:44
selwyn
unfortunately, on trying to do demo-clasp-cxx-interop, i got lots of inconsistent missing override warnings and a simple-program-error JIT session error: Symbols not found: { __emutls_v.my_thread_low_level, _ZTH19my_thread_low_level } https://www.irccloud.com/pastebin/ZlAMlhxr/
20:06:02
Bike
man, the compile times are really not good though. compiling the generated discriminating function for cleavir-ir:successors takes a little under half a second either way
20:07:08
drmeister
So if I substitute discriminate2 in this function - then it will use discriminate2
20:07:09
drmeister
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/discriminate.lsp#L435
20:10:10
Bike
you'll get slightly different code since i tried changing some things in case, but it doesn't matter for a relative comparison
20:13:17
Bike
my first hypothesis for this slowdown was that i screwed things up and there are a lot of dispatch misses, but things seem to work for most classes. cxx classes i'm not sure about.
20:16:57
scymtym
i may have code for loading the dtrace output into it, but for this specific use-case, loading an svg into a browser is probably easier since it avoids the need to get McCLIM up and running
20:18:07
scymtym
but writing a Clasp backend for deterministic and statistical profiling might be a good idea at some point
20:43:25
scymtym
kpoeck: i don't think i understand the question. there is no code for Clasp at the moment
20:45:09
Bike
having tabular output would be nice. then i could just get data from the repl. wouldn't be the same thing as a flame graph, but that's ok.
20:45:51
scymtym
drmeister: this is the backend i wrote for SBCL: https://github.com/scymtym/clim.flamegraph/blob/advice-backend/src/backend/sb-sprof/source.lisp#L42 the interrupt handler is at the end
20:52:03
drmeister
I need to write something to analyze the text output from a build and compare compilation times of every file.
21:54:32
karlosz
OK, i have a preliminary table of the behavior of how llvm optimizations affect full build order: https://paste.gnome.org/phoovrkb3
21:55:18
karlosz
it seems like the only real measurable difference is not doing optimize module at all, otherwise, as long as at least the O2 optimization pipeline is used, you get similar processing times in llvm
21:55:33
karlosz
from looking at the flamegraph, it's pretty clear that most of the time is still dominated by cleavir, particularly cst->ast
21:56:49
karlosz
i think with the new PM it should be much easier to write a custom pipeline for clasp like other compilers based on LLVM do, where you choose the optimizations that make the most sense, i.e., best compile-time for generated code speed tradeoff
21:59:19
karlosz
but not always, maybe only subject to the compile-time flag in wscript, as far as i can tell
22:00:28
karlosz
i was doing it on bidmac so if anyone else was building at the same time, it may have affected things
22:03:37
karlosz
right, but i certainly don't think it makes things much faster in the current setup
22:13:49
drmeister
Bike: I don't understand why I have this error: (GENERIC-FUNCTION-CALL-HISTORY GF) is not a supported place to CAS
22:48:31
karlosz
this graph shows the compilation of cleavir/covert-special.lisp with the new PM: ocf.io/~karlos/flame12.svg
22:48:51
karlosz
this explains why switching pass managers for optimizeModule doesn't seem to make much of a difference
22:53:08
karlosz
aha, i see a way to improve the speed from looking at the graph. IR module optimizations already include IR function optimizations, so doing both is just redundant
22:57:41
karlosz
well, i'm not sure if the current way in jit-setup is actually doing it twice (it does populate both the function pipeline and the module pipeline), but doing so in the new PM is redundant
22:58:12
drmeister
Another thing that I've thought about is cleavir's map-instructions does a lot of consing of hash-tables. Could we estimate the number of instructions and set the initial size of the hash-table used to do map-instructions?
23:00:15
karlosz
we can cache the number of instructions in the graph somewhere as an approximation of the initial size, if the hash table resizing is actually a bottleneck
23:16:35
Bike
what should be happening is that with-early-accessors macrolets generic-function-call-history, since it's a slot in standard-generic-function
23:17:05
Bike
but in that backtrace it seems like there's no generic-function-call-history macro, because if there was it wouldn't be calling default-cas-expansion
23:19:27
drmeister
Where should the generic-function-call-history macro be defined? I don't see it anywhere. In my code or yours.
23:20:51
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/hierarchy.lsp#L146 here's the slot.
23:21:17
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/std-slot-value.lsp#L66-L81 here's with-early-accessors, which as you can see should expand into a macrolet.
23:39:45
drmeister
I'm having trouble with ediff-directories as well - it says there are differences between directories but the files within them are all the same.
23:46:49
Bike
like does it not know about the with-early-accessors macro, so it's compiling it as a function call?
23:48:18
Bike
because that would explain the problem, though i don't know how it would miss that macro.
0:13:32
drmeister
So a test discriminator compile takes a little bit longer and the overall build time is unchanged
0:14:18
drmeister
Maybe the thing is going nuts when we turn off the interpreter and it goes more nuts if the discriminating function compiler is even a little slower.
0:14:55
Bike
but you're seeing that the build time is unchanged if you keep this in but keep the interpreter on? that's not what i just saw
0:16:38
Bike
yeah. normal build = like nineteen minutes. build with interpreter still on but this discriminator code = like 23 minutes
0:18:25
drmeister
I've been watching this guy "kitboga" on twitch. He calls internet scammers and strings them along for as long as possible.
0:43:45
drmeister
Here's the sorted list of the files that take the longest time to build across aclasp/bclasp/cclasp - explanation follows...
0:45:13
drmeister
The numbers on each line are... <pid> <seconds-to-build> <bytes-consed> <total-num-files-aclasp/bclasp/cclasp-58/141/473> <file-name>
0:49:47
drmeister
cl-wrappers.lisp takes almost 2x longer. We should be able to pick that up with a flame graph.
0:55:48
drmeister
Discriminator functions will get compiled as the interpreted discriminator functions age out.
0:56:48
Bike
https://github.com/clasp-developers/clasp/blob/master/src/core/funcallableInstance.cc#L371-L376
0:58:56
Bike
it seems like the improvement in the speed of discrimination is generally dwarfed by the compile time.
1:01:34
drmeister
If it's all interpreted generic functions and satiated ones - then the satiated functions must be slower than the old satiated functions.
1:01:36
Bike
since it's turned off in cleavir, i think this is the only place the compiler is used https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/satiation.lsp#L451-L618
1:02:03
Bike
it's hard to imagine these clos functions being so much slower that it doubles compile time
1:02:17
Bike
especially when in simple comparisons it doesn't look like the new code generated is in fact slower
1:03:10
Bike
like i said, my first suspicion is that something is dispatch missing a lot, but it's hard to tell
1:03:15
drmeister
How about we count the number of gf discriminating functions being compiled and report that with Time and consed in the build output.