freenode/#clasp - IRC Chatlog
Search
3:41:06
drmeister
You will have to ask Bike - I hoped it would be clear from the context - I don't follow the details.
3:42:18
beach
I thought we first wanted to work out whether inlining is a problem and, if so, how to improve it.
3:43:34
beach
And his proposal seems to be similar to that of karlosz in that it makes a special case for LET that makes inlining unnecessary for most LETs.
3:44:18
beach
1. Make special code for LET so that we can leave the problematic inlining code in there while still compiling Clasp fast enough.
3:51:43
drmeister
I'm not sure what to say - I don't understand well enough what he is planning to discuss it - I'm trying to not have my finger in everything.
12:27:40
Bike
well, first off inlining is certainly a bottleneck. there's no question. cst-to-ast results in major, major slowdown.
12:28:28
Bike
but i believe the slowdown is pretty much entirely due to LET. LET functions have unique characteristics, like having deep nesting (causing the multiple copies)
12:29:28
Bike
if we're inlining global functions, we can skip a lot of the analysis partial inlining does now- that's a major inefficiency- since we can just determine things ahead of time
12:30:24
Bike
we only need the full analysis for the comparatively rarer cases of local and anonymous functions, and i think the code we have now is acceptably performant for that anyway
12:33:06
Bike
i also don't think characterizing "pre-inlining" LET as special is fair. we'd be marking where variables are created, which we probably want to do for various reasons anyway, and then letting segregate-lexicals use that information.
12:33:23
Bike
it's letting segregate-lexicals consider multiple kinds of instructions as creating bindings, instead of just ENTER.
13:10:47
Bike
I guess it's like, we COULD compile (foo ...) for function foo as a call to FUNCALL, and then rely on general mechanisms to reduce it, but why bother? is that really a "special case" or just basic semantics?
13:27:33
beach
So here is what I think. Inlining has a performance problem. It might be that we are using a quadratic algorithm where a linear one is available.
13:28:21
beach
It is entirely possible that, when this problem is fixed, things will be fast enough.
13:28:55
beach
If that is the case, working on what I still consider a special case for LET will be wasted work, and will leave us with more code to maintain.
13:29:31
beach
In the meantime, inlining for other situations than LET still has a performance problem.
13:30:30
Bike
Okay, I don't mind looking at that first. Though I don't think I understand what quadratic performance you mean. From what I could tell, nested lets resulted in a linear amount of copying.
13:31:40
beach
Well, quadratic is not quite true, M*N rather, where M is the nesting depth and N the number of instructions on the average at each level.
13:34:38
Bike
I'm not sure I understand. One thing is that, as far as I can tell, if we inline a function that ENCLOSEs other functions, those other functions have to be copied as well, so that they close over the correct variables.
13:36:20
Bike
For example if we have (let ((x ...)) (let ((y ...)) (+ x y))), inlining outer first results in like (progn (setq xprime ...) (let ((y ...)) (+ xprime y)))
13:38:15
Bike
That going outermost in doesn't reduce the amount of copying since we still have to copy the inner parts.
13:39:59
Bike
oh, and the other thing i thought of was that right now segregate-lexicals examines all variables for being shared, but in practice only the minority of variables actually corresponding to lisp bindings need to be checked. i don't know if fixing that would help performance much, but it's a thought
14:07:49
beach
I can't see anything wrong with obtaining this one by inlining only the middle instructions: http://metamodular.com/example2.pdf
14:10:15
beach
I should probably also handle the case where the middle one creates a variable that is used by the rightmost one.
14:13:25
Bike
the problem is the local variables. like, consider if the output of enclose2 was used by the enter3 function. that output is copied into enter1, but enter3 doesn't use the copy.
14:23:11
beach
However, we must improve inlining anyway, so I think we must use a quadratic algorithm only when it is necessary.
14:24:27
beach
In the meantime, I would really like these statistics about how many times each instruction gets inlined for real big examples.
14:24:36
Bike
We could probably skip copying some inner functions if we tie the inlining in more with the determination of what variables are closed over (which we do anyway)
14:25:08
Bike
But for LET kind of functions where an inner body is going to use a lot of bindings, it might not help
14:28:59
beach
Bike: In this particular example, the function defined by enter2 should disappear and the one defined by enter3 should use the variable introduced by the function defined by enter1 instead.
14:33:31
beach
New versions at the same links: http://metamodular.com/example.pdf http://metamodular.com/example2.pdf
14:34:35
beach
So we should detect that case (which will then work for situations other than the ones that happen by LET) and handle it.
14:35:05
beach
Doing it that way will likely solve the LET problem AND make inlining faster in general.
14:36:35
Bike
i mean, and that would probably be easier to write than modifying any enclosed functions recursively
14:39:52
Bike
anyway, would it be a problem? we'd just have any return in the function being inlined turn into a local control transfer to the former return site
14:40:56
beach
Either way, I think the big gain is going to be from avoiding the M^2 behavior in most cases.
14:43:40
beach
Anyway, do you see my point that, if we treat this general case, the LET situation will likely take care of itself, and we need to treat this general case anyway at some point. Whereas if we handle the LET case specially, we will have more code, because we still need to handle the general case (if it is produced by something other than LET).
14:48:09
beach
Oh, and there is another interesting situation. If enter2 does not create a variable that is used by any of its descendants, then it doesn't matter whether it is called several times. It can still be inlined.
14:49:08
beach
So that's another reason to avoid two ways of doing it, one way by copying instructions and another way by not copying them. Two ways would give more code and more maintenance.
15:00:28
Bike
Something like that. Not all instructions are copied in an obvious way so I'll have to work something out.
15:01:55
Bike
oh, and i did another really basic count, and found that compiling a 100-line function with various LABELS, LET*, DOLIST and such involved over fifty five thousand calls to clone-instruction
15:13:58
drmeister
bike: When you first got the inlining to work - it was really, really slow. We ran it with dtrace profiling and it showed a problem that you were almost immediately able to fix. Do you recall what that was?
15:25:00
drmeister
Counting the number of times each progenitor HIR instruction gets cloned (with the two hash tables) will show the pattern of how inlining is applied (inside-out vs outside-in) but we can still learn a lot from the total number of clones made/number of progenitor instructions - right?
15:26:09
drmeister
Because the best approach would have "total # clones"/"# progenitor" somewhere near 1.0 - right?