libera/#sbcl - IRC Chatlog
Search
6:40:10
lukego_
Hey anyone have hot tips for "zero-cost abstraction" in numerical code with SBCL? I'd like to basically write Lisp code and get "Fortran" object code. I'm wondering if to a first approximation it's enough to stick with abstractions the compiler can see through (e.g. functions/closures), give everything concrete double-float types at the top-level, and inline all the subroutines to propagate types and burn away closures. Yeah-ish?
6:41:26
hayley
But indeed lots of inlining tends to work; you can get away with higher order functions to some extent I haven't fully probed.
6:44:27
lukego_
I don't necessarily need/want real inlining in all cases. More that I want a version of the subroutine that's specialized for the known types in its caller. Sort-of like the Julia compilation model.
6:46:48
hayley
Could well be done in userland; I recall reading of some kind of template-esque library.
6:49:14
lukego_
That certainly helps me to stop worrying about inlining being "overkill" quite so much :D
7:10:16
lukego_
Generally working with JIT compilers like LuaJIT and Julia has rewired my brain quite a bit. I'm kind of in awe of how much mileage one can get from specializing subroutines using type information that's available in their callers. I guess I'm just slow to appreciate this because inlining is as old as time itself.
8:06:28
splittist
lukego_: The Serapeum library has a bunch of stuff to help write that sort of thing. Possibly too general, but may be worth a look.
8:36:00
lukego_
splittist: Good point. I'll need to flesh out a mental model of when it makes sense to use those verses inlining. My immediate thought is that might be best for library code that you expect to be called with many different runtime types and always want to select sensible code.
8:39:59
lukego_
Example: I have a subroutine that reorders the values in a sequence. If I call it with an argument that is (SIMPLE-ARRAY DOUBLE-FLOAT (*)) then I really don't want it to do something crazy like allocate a box for each float that it copies. I'm thinking that if this subroutine is only used locally in one file then it's enough to declare it INLINE and make sure the calling function knows the type of the argument.
8:43:02
lukego_
Trickier with a function that will be compiled once and then called from multiple different places/modules. That's probably where the Serapeum macros kick-in for anticipating the potential special cases and making sure that specialized code for each one is pre-compiled.
8:43:42
lukego_
This is the point where I would potentially feel really sad and deprived, knowing that Julia does all of this stuff automagically, but thankfully I have spent enough time with JIT compilers in recent years to be more than happy to see the back of them for a while.
8:53:07
lukego_
... I could also just manually add type declarations to all my subroutines, since I happen to exactly know all of the concrete types, so I'm just looking at inlining as a convenient way to be able to elide those by depending on type propagation. I don't actually understand how the SBCL compiler works yet so we will see how that goes :D
8:56:58
lukego_
Aside: LuaJIT has a nice trick for inlining without breaking code update. It does a runtime lookup of the function definition decides whether to call that (if it's changed) or run the inlined code (if it's the same.) Generates an inlined call something like:
8:57:14
lukego_
(IF (EQ (FDEFINITION 'SUBROUTINE) #<FUNCTION @ 0x1234>) (PROGN ...inlined-code...) (SUBROUTINE ...))
8:58:15
lukego_
So effectively doing C-c C-c on SUBROUTINE will un-inline the calls, and if you want them re-inlined then you need to recompile the callers.
8:59:31
lukego_
Come to think if it maybe that kind of inlining could just be a macro or compiler-macro written without compiler support in Lisp.
9:00:35
moon-child
you might perhaps be able to effect something like it with compiler macros, but ...
9:05:57
moon-child
but--it would be a klunky kludge, not integrating with the rest of the compiler infrastructure. What if the compiler decides not to inline? Best case scenario, you have two copies of the function; still pointless I$ thrashing. What's your recompilation policy for corecursive functions?
9:30:24
lukego_
I'd like to make those decisions myself, personally. I know much better than the compiler whether my software will be under instruction cache pressure or not.
9:32:23
lukego_
Addendum: The reason that soft-inlining trick is the heavy speculative execution in mainstream CPUs. The CPU will easily predict whether the inline code should be run or not -- it'll almost always be the same as last time -- and so it will run the check in the background.
9:33:07
Krystof
I think most of the specialized inline / generic templating / call site optimization *can* be done "by hand" with compiler macros
9:34:36
Krystof
I think it would be nice if we had an easier way of doing that (e.g. automatically generating specialized versions of generic code, something like a (with-local-specialization ...) or (declare (specialize foo)) to hint to the compiler that this is an area where calls to FOO should be considered for call site optimization
9:34:43
lukego_
Krystof: I dunno what life choices lead me here exactly but for the past decade or so all of my software has executed as a dozen-or-so kernels of a hundred-or-so instructions each. I have joked so often that I'm writing Fortran code in $LANG that I actually bought a Fortran book last week...
9:37:22
lukego_
I'll have to be careful what I complain about. I remember at an ECLM complaining about some SBCL behaviour within jsnell's earshot. "Luke, you bastard, you were complaining about that last year too and I fixed it all months ago, and you didn't even bloody notice!" :D
9:39:48
lukego_
I am dead keen to understand the SBCL compiler better. LuaJIT hacking was a good exercise in just making regular trips into the source code, regularly staring at bits of IR code, and gradually having stuff click into place. I was hoping that e.g. `C-u C-C C-c' in SLIME would show me IR code for a defun but didn't stumble on such yet.
9:40:43
lukego_
though another takeaway from my luajit phase is that compilers generate an awful lot of code and it can be much more productive to work backwards from what the profiler flags as relevant than just reading IR/mcode listings en masse.
9:42:17
lukego_
I did COMPILE-FILE with :TRACE T and quickly acknowledged that I won't be reading whole 50 KLOC diagnostic messages on a routine basis :D
9:50:55
moon-child
lukego_: if you would like to make such optimisation decisions yourself, then your time is probably best spent with a macro preprocessor for assembly
10:03:42
lukego_
I do love that kind of programming. did a lot of that with DynASM in LuaJIT-land. But for the moment I'm way more high level than that. Just want to avoid e.g. 100x overhead of boxing double-floats inside tight loops _and_ also avoid writing manual type declarations on every utility subroutine.
10:08:38
lukego_
well, also, it's just a really nice ergonomic luxury of both luajit and Julia to be able to write little subroutines and know that they'll be specialized automatically and won't screw up all the optimizations in their callers. Just want to understand to what extent I can indulge in the same thing easily with SBCL. Just need to better understand the nuances of the way SBCL does inlining I think.
10:12:26
lukego_
I've also considered the possibility of specializing arithmetic operators in OCaml style e.g. #'+. and #'*. that are defined to operate on DOUBLE-FLOAT. This would flip the problem upside down and inject strong type information down at the bottom that might propagate upwards via inlining. I guess I'll gradually see which ideas are more/less misguided.
10:14:44
lukego_
and once I start reading IR code I'll probably find that there is lots of type uncertainty that I hadn't properly taken into account e.g. operations that I think of as being double-float valued that can really return a complex number, etc. Sorry this has become a stream of consciousness disordered monologue :)
11:45:51
luis
I wonder why (type-of -2) => fixnum but (type-of 2) => (integer 0 <most-positive-fixnum>). Krystof implemented this 20 years ago, I'm sure it's fresh on his mind why. :)
11:47:38
|3b|
has to be a recognizable subtype of any built-in types containing the value, in this case unsigned-byte and fixnum?
12:00:38
luis
|3b|: would you say (integer 42 42) is a built-in type? I guess it's fair that type-of avoids consing though.
12:04:12
|3b|
also isn't very useful for trying to decide if two things have "the same type" in some sense
12:43:42
pdietz
I still would like to see Strandh's call-site optimization in SBCL. A small matter of programming, I'm sure.
13:28:24
_death
I'm feeling the only way such a query could "work" in general is by having type-of return something like (eql 42)
13:30:55
pdietz
Maybe (type-of x) should be equivalent to something like (canonical-type `(eql ,x)), where canonical-type has the property that (subtypep y (canonical-type y)) is true, and that if (subtypep x y), then (subtypep (canonical-type x) (canonical-type y)).
13:35:33
_death
pdietz: doesn't class-of serve as canonical-type? clhs says (subtypep (type-of x) (class-of x)) => t, t
13:43:04
_death
a canonical-type (that takes a typespec) would have the usual limitations (think satisfies) that have subtypep return a secondary value..
14:08:30
pdietz
I am imagining canonical-type working on types, not objects. So, one might do (canonical-type '(member 0 15)), or more complex types.
14:27:33
lukego_
Hey tangentially: what's the SBCL hacker view of Coalton? Specifically is the type inference in Coalton likely to translate into heavily specialized ("Julia") compiled code or is the "intermediate" SBCL code more generic?
15:28:09
karlosz
lukego_: iirc coalton does its own type inference and monomorphization, which means all call sites are annotated with specialized compiled code (assuming that the underlying lisp implementation is able to compile the specialized code)
15:29:08
karlosz
there is also block compilation which allows local call convention to be used - importantly it allows unboxed floats to get passed around without any inlining needed
15:29:54
karlosz
as you mentioned the user is always going to do a better job for this type of thing
15:31:11
karlosz
pdietz: main issue with something like call-site optimization is that it requires a fair bit of GC coordination and rewriting the call convention (+ associated machine code for 6 architectures)
15:34:11
karlosz
the path that was explored in the 90s seemed to be more "dynamically recompile callers based on a dependency tree" or something
15:59:30
pdietz
A problem with all this is what happens when something is redefined while a function that depends on those definitions has calls on the stack.