freenode/#clasp - IRC Chatlog
Search
11:51:36
Colleen
Bike: drmeister said 4 hours, 38 minutes ago: - One ironclad function (http://paste.lisp.org/display/350808) - takes a huge amount (2439 seconds) - the pass is mark-dynamic-extent (1818 seconds). The next function after that is only 259 seconds.
13:33:00
Bike
liveness spends 20% in member, 20% in union, 10% in set-difference. having an actual set implementation would fix that up, or as a more stopgap kind of thing i can speed those particular functions up a bit
13:53:33
Bike
so probably no magic bullet here. we could disable trying dynamic extent if it's useless, like here, but otherwise it basically amounts to "make clasp go faster"
13:59:34
Shinmera
Yeah but that's just around the threshold probably, so I don't know if it's a guaranteed win
14:01:07
Bike
let's see, assoc is a C++ function for whatever reason, that just takes test and test-not directly, and does some kind of "Tester" thing
14:20:02
Shinmera
Given that it's the most frequent comparison operator, inlining it seems fairly important
14:28:20
Bike
(loop for pair in list when (eql item (car pair)) return pair) is like 50% slower than a full call to assoc. ugh
14:44:25
Bike
i guess it's not that strange. all the little inefficiencies like boolean tests and such are going to be the main factor when i do something a million times
14:50:13
beach
The modular way of doing that is to use a hash table, but that's probably not going to be fast enough.
14:50:37
Bike
but performance is reasonable on sbcl. doing the same mark-dynamic-extent, it takes half a second for what takes clasp, uh, half an hour.
14:51:47
beach
Seriously, yes, it might be worthwhile trying to figure out why Clasp is this much slower in this case.
14:52:34
beach
Failing that, stealth mixins will give you both modularity and performance in case you want to number the data.
14:52:46
Bike
numbering data is probably good too, but after my misadventure with bit vectors i'm inclined to just stick with the apparently reasonable algorithms that i know work
14:53:27
beach
Like I said, it might be worthwhile trying to figure out why Clasp is this much slower.
14:54:57
Bike
because the loop was supposed to be what i'd inline assoc with no keyword arguments as.
14:55:57
Bike
i did have some success with other inlines. like, last few days i set it up so that map, coerce, and make-sequence with constant type argument deal with it at compile time, and hey, now they take half as long
14:57:45
beach
Sure. I am tired after a long day, so I am just generally meddling with your business. I am not restricting my discussion to Cleavir.
14:58:28
Bike
maybe i should try boolean elimination... that'll make a lot of inlines actually helpful
15:05:36
beach
Would it be a way to improve the dynamic extent calculation, or is this a different optimization to make things faster?
15:07:52
Bike
different. i'm thinking of how to make the lisp assoc faster than the C++ one. a faster assoc would help dx, but also, like, everything else.
15:08:37
beach
In general, it is possible to do a better job in Common Lisp than in C++ because you can special-case things with compiler macros etc.
15:15:47
beach
So ENDP needs to be something like (cond ((consp bla) nil) ((null bla) t) (t (error....))) no?
15:16:10
Bike
i added a definition to inline it as i said. for nonzero safety the THE will be turned into TYPEQ.
15:17:38
beach
Well, first you test the CONS tag, which is going to be true, but then it can be either a CONS or NIL, so you have to make a second test.
15:18:04
Bike
in this case i put on safety zero for testing's sake and it was still slower, even though that reduces (endp foo) to (eq nil foo). i guess because of booleans and such
15:18:07
beach
Or, you can test for NULL first. But that test is going to be false, so you have to test the CONS tag after that.
15:18:57
Shinmera
Testing for cons first seems better for branching anyway since it would succeed in most cases, right?
15:21:00
beach
Shinmera: But, yeah, branch prediction should make the test come out the same way most of the time.
15:21:57
Bike
the HIR for the unsafe "fast" version seems to only call eq and eql. not that it should actually call either, but still
15:22:31
Bike
as many as four pointless assignments in a row. i hope meister was right when he said llvm eliminates those
15:23:47
beach
EQL is tricky to get rid of. Maybe (OR (EQ .. ..) (FULL-EQL .. ..)) is a good compromise.
15:24:35
Bike
but yes, i could compile-file it and disassemble the bitcode, i guess. but i think there's something built into clasp
15:26:07
Bike
beach: (defun eql (o1 o2) (cond ((typeq o1 number) (if (typeq o2 number) (number-eql o1 o2) nil)) ((typeq o2 character) (if (typeq o2 character) (char= o1 o2) nil)) (t (eq o1 o2)))) is what i had in mind
15:30:39
Bike
actually for full inference you'd probably want to check both without an order. it's not really well expressed as an inline definition
15:32:56
Shinmera
I'm not 100%, but if I remember correctly characters are tagged words in clasp, so you'll get lucky with those too.
15:34:26
Bike
i guess i can just do (define-compiler-macro eql (o1 o2) `(or (eq ,o1 ,o2) (locally (notinline eql) (eql ,o1 ,o2)))
15:36:00
beach
There are a lot of cases that might be worthwhile, for example heap objects that are not EQ are not EQL. Same thing for fixnums and characters.
15:36:47
Bike
the way compiler macros work is, if the compiler macro returns the whole, exactly, it's not expanded again
15:37:22
Bike
mhm. requiring the notinline to be there makes sense to me though, it would be weird to require a, like, database of previously expanded forms
15:38:19
beach
(cond ((eq x y) t) ((or (heap-objectp x) (characterp x) (fixnump x)) nil) (t (full-eql x y))) is what I suggest.
15:40:49
Bike
how about, (cond ((eq x y) t) ((typeq x (not (or character number))) nil) (t (full-eql x y)))
15:42:14
Bike
hm, but then if x is known to be a symbol, say, you'd end up with (if (eq x y) t nil)... well i guess that's another case for boolean elimination
15:53:09
Bike
and it looks like llvm does eliminate redundant assignments... all this landing pad crap still bothers me, though
16:26:08
Bike
when any function returns there's this snippet that runs to change what function is on top of the stack for backtrace purposes
16:39:00
Bike
okay, so i wrote an assoc in terms of primops, so i don't need to worry about boolean elimination or type inference or anything
16:39:34
Bike
for a test, regular assoc took 1.218s. this assoc took 1.544s. if i changed the primop assoc to use primop:eq instead of calling eql, 0.617s
16:41:26
drmeister
Build a list of things that need inlining. I need to get through Monday and then I can assist.
16:43:40
Bike
i've tried to understand how to make those primitive things usable from clasp, for inline definitions, and failed
16:45:46
drmeister
(1) From either a cleavir xxx-instruction or a bclasp codegen-xxx function (or both) call a function that generates llvm-ir directly - this works in COMPILE and COMPILE-FILE code
16:46:18
drmeister
(2) Write a C++ intrinsic and put it in intrinsics.cc. That gets inlined using llvm inlining and only works with COMPILE-FILE code.
16:48:38
Bike
in other news, i tried a few things replacing (eq x y) with (if (primop:eq x y) t nil), and it saves over half the time even without boolean elimination
16:51:16
drmeister
I thought I generated more code for the eql test in fastgf - but I adopted a hybrid approach to get things up and running.
16:51:47
drmeister
That inlines an EQ test and if that fails punts and calls cc_eql - that will be inlined in COMPILE-FILE'd code.
17:03:08
drmeister
cc_eql doesn't currently do the EQ test because that is in the generated code - but it could be made to do that.
17:03:49
drmeister
Also, it returns 1 or 0 based on the result - it needs to return <something> or NIL correct?
17:14:36
drmeister
I deferred to tx.unsafe_single_float()==ty.unsafe_single_float() because I wasn't sure.
17:22:07
Bike
this actually seems fine to me. i'll do like (declaim (inline eql)) (defun eql (x y) (or (eq x y) (whatever "cc_eql" x y)))
21:18:19
drmeister
at child.on_msg (http://0.0.0.0:8888/nbextensions/nglview-js-widgets/index.js?v=20170714202205:893:42)
22:45:45
Bike
so: i think there's a lot of room for microoptimization of basic library functions, and that i can do that most effectively if i have a typeq that isn't extremely slow
22:49:22
Bike
...and those optimizations will probably make everything a bit faster, though it's harder to gauge that impact
22:51:59
drmeister
(TYPEQ x y) where y is 'fixnum 'single-float 'character - those are just (fixnump x), (single-float-p x), (characterp x) respectively
22:53:03
Bike
if possible i'd rather not call a function at all. (or call an intrinsic that will get inlined later, i guess.)