freenode/#sicl - IRC Chatlog
Search
19:50:00
karlosz
Bike: you also have examples of one module (code compiled together) having multiple components, like in (flet ((f (x) x)) (g (y) y))) (setf (fdefinition foo) #'g) (setf (fdefinition bar) #'f))) where you can split up the code into 3 components (top level, component containing #'f, and component containing #'g)
19:52:05
karlosz
so the sbcl source has this as the toplevel comment for components: https://paste.gnome.org/pqhz31znh
19:53:44
karlosz
the memory usage and being able to reclaim individual components made a lot of sense in '85, but you can still get some memory issues even now
19:54:54
karlosz
yeah, having the flow graph be small for flow analysis purposes is pretty important
19:56:30
karlosz
currently though it doesn't really seem like the Cleavir front end is built for block compiling multiple top level forms
19:57:02
karlosz
it seems like the front end pipeline really just operates on one top level form at a time
19:58:02
karlosz
but being able to separate the actions of top-level form processing/minimal compilation from actual compilation will enable block compilation stuff
19:58:07
Bike
I think you'd pretty much have to have an alternate CST-to-AST going. you'd have to do stuff like recognize defuns rather than just blindly macroexpand them, right?
19:59:01
karlosz
well, it just means that you need defun to side effect the compilation environment during compiler processing
20:00:06
Bike
well i mean, for example, with block compilation we'd want (defun foo ...) (defun bar ... (foo ...) ...) to be a local call, right? it's a different thing from just inlining it. the compilation environment doesn't have that kind of info now.
20:00:10
karlosz
how it works in Python is that all the top level forms are read in and ir1 converted into initial ir1 data structures with the names resolved properly
20:01:00
karlosz
and then that set of initial ir1 functions gets compiled together, split up into components, and local call analyzed
20:02:54
karlosz
the idea is that all the toplevel forms being block compiled get processed in the same namespace associating names to ir1 leaves (which we half have a concept of in bir)
20:03:23
karlosz
'leaf' structures are the things that can get referenced in ir1, like how we have constants and variables and load-time-values represented right now
20:06:33
Bike
certainly a better way of dealing with function calls would be nice. i think beach already hit some messiness there relating to the sicl loading procedure.
20:07:01
Bike
also the way we do them now means (let () #'asdkfj) doesn't signal an error, which is pretty bad...
20:07:07
karlosz
since we don't have forms get translated directly into bir, (which is what is the closest thing to ir1, with its suprafunction level container) like sbcl does, but rather have 2 intermediate "irs" (cst and ast) it will be more involved to do the same thing
20:07:48
karlosz
yeah, a lot of the messiness in the irs right now (including load time hoisting and stuff like that) comes fomr not having a supra-function level container structure
20:08:07
karlosz
things like modules and components obviate the need for complicatged hoisting procedures for load time value code
20:10:14
karlosz
otherwise you'd do something wasteful like putting stuff that logically belongs into the module/component into the static environment because you don't have anywhere else to put it
20:12:59
Bike
i suppose a good step would be to separate the regular-code part of cst-to-ast from the toplevel form processing stuff
20:13:16
Bike
i've kind of wanted to do that for a while but haven't thought hard enough about how to do it
21:09:15
karlosz
i found it became so much easier to do optimizations once the data structures became a bit more organized and less ad hoc
4:24:13
beach
So, call-site optimization will very likely be a good substitute for block compilation too.
4:51:30
Bike
i haven't really used it. my understanding is that it's roughly equivalent to treating all the defuns in a file as being in a big LABELS form, so the compiled functions can refer to each other directly and such.
4:51:33
no-defun-allowed
https://cmucl.org/docs/cmu-user/html/Block-Compilation.html suggests it reduces calling overhead, allowing all the tricks that, say, LABELS could use for local functions with calling conventions.
4:56:57
beach
And, changing the way those are handled, significantly simplified the loading procedure.
5:03:03
beach
So now, the static environments of globally defined functions are usually empty. And call-site optimization can take advantage of that fact by not loading the static environment in those cases. That means fewer memory accesses.
5:08:58
beach
I wonder why things get so quiet when I mention call-site optimization. Is it that people don't believe it will work? If so, I need to know. Or is it that I haven't explained it very well? Or something else?
5:18:17
beach
Yes, I think only heisig has given some feedback on the technique, and that kind of worries me.
5:19:00
no-defun-allowed
I understood the explanation fine, and I believe that it will work well, but I don't have anything to say.
5:20:17
Bike
well, mostly i'm winding down for the night, and second i would like to see it actually functioning
5:21:03
Bike
and sometimes you say things that confuse me. the other day you mentioned that it would mean you don't need to worry about putting values in particular registers for calls, but i don't understand how that's possible, given that the actual function code is going to be using whatever fixed registers
5:23:06
beach
If the call site indicates where arguments can be found, and the callee separates argument parsing from the main body, then argument parsing can be skipped, and the snippet can load arguments directly from the places indicated by the caller to the variables of the body.
5:25:15
Bike
but i mean, say the function body has its first two arguments in r1 and r2, and the caller has those values in r3 and r4 before the call. to actually do the call, the caller (or snippet, or whatever) is going to need to put r3's value into r1 and r4's into r2, and so if the caller was doing something else with r1 and r2, those probably need to be preserved.
5:26:30
beach
But, like if the caller would otherwise load from the stack and put it on the stack for the callee to find, and the callee would then pull it from the stack to a register, then the stack operation could be skipped.
5:28:07
beach
I need to vanish. The store opens in 30 minutes and I need to be there soon after that.
5:28:31
Bike
i'm referring to when you said "And, by the way, the call-site optimization technique means that 1. Most of the time, no particular register needs to be reserved for argument passing and 2. That any register can be used to pass arguments, so we can have many more arguments passed in registers than with a small and fixed set of registers reserved for this purpose."
5:29:22
Bike
the other thing is that i figure the call site optimization and stuff like block compilation do the same kinds of interprocedural stuff anyway, so working on one will probably help the other
5:29:37
Bike
you're just moving the interprocedural part into the snippet; the same kinds of analyses still have to happen
5:35:04
Bike
like - say boxing and unboxing. say you have F that calls G with one argument that's a double. with block compilation, F is compiled to call G's code directly, and puts the double, unboxed, in some xmm register that can be chosen arbitrarily by the compiler - it ensures both F and G use this same register for it.
5:36:21
Bike
with call site optimization, F is compiled to jump to the snippet with the unboxed double in whatever arbitrary register. then the snippet is generated that calls G, which possibly wants the argument in some different arbitrary register. the snippet does a register mov which is basically free.
5:37:46
Bike
the call site optimization version can deal with redefinition, which is a definite advantage over block compilation. but from the compiler's perspective the analysis is basically the same, it's just that it's the snippet compiler doing it rather than whatever compiles F
6:01:07
Bike
currently in s expressionists cleavir we have "local call" instructions, which are used when the callee is known by the compiler; for example calling a labels/flet/lambda function, thus the name, but they'd also be how block compilation would work
6:02:07
Bike
since the compiler knows about both the caller and the callee, a local call can be compiled however. for example in clasp, local calls end up ignoring the C calling conventions and using whatever registers are most convenient
6:02:33
Bike
they could more exotically involve things like jumping into the middle of a function - though that isn't done yet other than some argument parsing skipping if you count that
6:03:03
Bike
now, with the call site optimization, in my view what you're basically doing is just building a snippet that does a local call
6:03:34
Bike
just as with the current local calls, the snippet generator would know the callee, and could do pretty arbitrary things
6:04:19
Bike
so from my perspective what the call site optimization gives you is the ability to make calls that couldn't normally be local local
6:05:20
Bike
so i haven't commented on call site optimization much because i'm like, well, that sounds good to me, and requires the same compiler stuff i'm already working on for "block-compiled" local calls, so i'll just keep doing that
6:05:56
Bike
even with call site optimization you could still use these "block-compiled" local calls for things like flet/labels where you know no redefinition is happening
7:08:14
karlosz
the idea of block compilation is that you can use statically known information across functions like type information
7:08:54
karlosz
also it allows let-conversion and contification and deletion of entry points as well as avoiding heap allocated closures
7:09:09
karlosz
the only thing call site optimization does that is similar to local calls is the removal of an indirection
7:10:43
karlosz
i recommend reading the cmucl manual section that no-defun-allowed linked about block compilation and what optimizations local calls enable; it gives a bunch of specific optimizations and examples
7:11:29
karlosz
the thing is call-site optimization can only do optimizations related to the call site - it cannot in general reach into the guts of the caller and make the caller somehow more optimized by providing extra type information
7:12:15
karlosz
so its really just a limited form of local calls that relaxes the redefinition constraint by moving that into the snipper compiler as Bike alluded to; none of the other advantages of block compilation are possible