libera/#sbcl - IRC Chatlog
Search
9:02:34
hayley
So I think I've found that most optimizations in the SBCL compiler aren't applicable for the DFAs I've been generating. Optimizing (speed 3) or (speed 0) doesn't seem to change anything, and even dropping SB-C::*REOPTIMIZE-LIMIT* from 10 to 3 doesn't change performance either.
9:03:27
jackdaniel
technically speaking there is a compilation-speed optimization, but I'm not sure whether sbcl does anything with it
9:07:43
hayley
Currently I am using (declare (optimize (speed 0) (safety 0) (compilation-speed 3) (debug 0))).
9:27:17
flip214
ouch.... speed 0, safety 0? I've only seen these changed in different directions -- (speed 3) (safety 0) or (safety 3) (speed 1) (debug 3) or so
9:29:35
hayley
In my opinion as someone who doesn't know much about the SBCL compiler, there's not much to optimize. There's basically a fixnum position index, an array, and a loop which reads the array at the index, bumps the index, then jumps to another state.
9:32:08
flip214
the disassembly shows calls to the right functions, or to generic ones? is indexing inlines, for example?
9:32:15
hayley
(The only optimization I really rely on is copy propagation, which is necessary with many submatches and alternations, cause I can't implement it myself.)
9:33:33
hayley
To be more specific: the compiled code runs pretty damn fast. 700Mcharacters/second is brilliant. But I can get the same performance regardless of optimization settings, and I like the idea of having a fast-ish compiler at runtime.
9:36:47
hayley
I've experimented and it really seems like the runtime speed is not affected by optimizations.
10:02:01
hayley
The latter, unfortunately. I could probably use a better naming scheme in the code generator.
10:08:14
hayley
Otherwise, my mental model was that (in a perfect world) everything matched to one or two machine instructions, like "load a word", "compare to constant", "jump".
10:23:47
flip214
hayley: I believe there's lots of checks in case the index overflows from fixnum to a bignum, perhaps you can eliminate these checks.
10:23:57
flip214
try (type (integer 0 20000) start END ) (type (integer 0 20000) START POSITION #1# #3# #4# #5#)
10:25:34
scymtym
hayley: you could compile in a loop and sb-sprof. sometimes individual compiler phases stand out
10:26:02
flip214
hayley: all these 483B0599FEFFFF CMP RAX, [RIP-359] ; [#x5398ED48] = #x3FFFFFFFFFFFFFFB ??
10:28:38
scymtym
would be interesting to see how compile-time and runtime numbers compare for a solution with mutually recursive tail-calling functions and without assignment
10:31:29
hayley
Looking at the flamegraph...IR1-OPTIMIZE-UNTIL-DONE is pretty big, then CONSTRAINT-PROPAGATE and IR1-CONVERT-LAMBDA are the next largest.
10:33:00
scymtym
declaring (simple-array character) instead of simple-string might gain you something (also at runtime) since the latter type includes (simple-array base-char)
10:33:00
hayley
I think I tested folding out the opposite test to no avail. i.e. (cond ((eql A B) ...) ((not (eql A B)) ...)) and (cond ((eql A B) ...) (t ...)) had similar runtime performance.
10:34:59
hayley
Ah yeah, (SIMPLE-ARRAY CHARACTER) seems to have cut down compilation time to about 75% from SIMPLE-STRING.
10:35:01
scymtym
for example, (lambda (a) (declare (type T a)) (aref a 0)) with T = (simple-array character) and T = simple-string is 37 byte straight-line code vs. 49 bytes with a branch (admittedly, compiled with speed 3, debug 0, safety 0)
10:38:11
mfiano
Try with speed 3 now that you made that change and see how it affects run/compile speed
11:50:41
stassats
hayley: if your code doesn't need a lot of optimizations then it will be compiled fast too
11:52:35
stassats
but one trick to produce worse code faster, set sb-c::*constraint-propagate* to nil
11:54:05
hayley
Setting sb-c::*reoptimize-limit* to 3 seemed to be the minimum where runtime performance didn't change.
12:26:42
hayley
The runtime performance of one regular expression isn't affected at all, but the other is. So I guess I'll stomach the compile time.
12:28:48
hayley
I might or might not try compiling DFAs at runtime, though so far I'm convinced it won't be too useful.
12:33:17
scymtym
couldn't you use the compiler macro + LOAD-TIME-VALUE idiom if the regular expression is constant?
12:33:42
hayley
Though, even if all regular expressions are known at compile time, it'd be nice to have them compile relatively quickly.
12:35:26
hayley
One use case would be writing a grep clone. But you'd need a pretty big data set for this compiler to pay off.
12:38:05
hayley
(Notably, that use case only has the regular expression known at runtime. Maybe getting fancy with tiered compilation, or planning by an estimate of the work size, and a faster but worse compiler would help.)
12:43:45
hayley
Maybe the worse compiler should use a chain of closures like cl-ppcre did. But the DFA conversion algorithm I use creates a lot of temporary variables with submatches.