freenode/#clasp - IRC Chatlog

3:35:26 drmeister Hi beach

3:35:51 drmeister beach: Would you describe cleavir as a "sea of nodes" compiler?

3:36:09 beach I don't know the definition of that.

3:37:02 beach I read the plan that Bike has.

3:37:14 drmeister https://darksi.de/d.sea-of-nodes/

3:37:29 beach I don't know whether his plan was meant for Cleavir in general or for Clasp only.

3:38:51 drmeister I think it's for Cleavir.

3:39:21 drmeister Does it sound appropriate for Cleavir?

3:40:10 beach I am not sure what to say.

3:40:29 beach What would the goal be?

3:41:06 drmeister You will have to ask Bike - I hoped it would be clear from the context - I don't follow the details.

3:41:33 beach OK. I'll be somewhat busy for the next few days because we have house guests.

3:41:52 beach I am a bit surprised by his proposal.

3:42:02 drmeister What is surprising about it?

3:42:18 beach I thought we first wanted to work out whether inlining is a problem and, if so, how to improve it.

3:42:41 beach Because if inlining is a bottleneck, then it has to be improved for everyone.

3:42:46 drmeister Ah - from what he told me inlining is a problem according to profiling.

3:43:34 beach And his proposal seems to be similar to that of karlosz in that it makes a special case for LET that makes inlining unnecessary for most LETs.

3:43:45 beach So to me it looks like we have a choice:

3:44:16 drmeister I think you and he need to talk - I'm not following the details here.

3:44:18 beach 1. Make special code for LET so that we can leave the problematic inlining code in there while still compiling Clasp fast enough.

3:44:23 beach 2. Improve the inlining code.

3:49:50 beach I am very likely going to be busy while Bike is away for the next few days.

3:51:43 drmeister I'm not sure what to say - I don't understand well enough what he is planning to discuss it - I'm trying to not have my finger in everything.

3:52:06 beach I understand.

7:16:13 heisig Good morning!

7:16:21 beach Hello heisig.

7:59:48 alter-schjetne ** NICK schjetne

12:27:40 Bike well, first off inlining is certainly a bottleneck. there's no question. cst-to-ast results in major, major slowdown.

12:28:28 Bike but i believe the slowdown is pretty much entirely due to LET. LET functions have unique characteristics, like having deep nesting (causing the multiple copies)

12:29:28 Bike if we're inlining global functions, we can skip a lot of the analysis partial inlining does now- that's a major inefficiency- since we can just determine things ahead of time

12:30:24 Bike we only need the full analysis for the comparatively rarer cases of local and anonymous functions, and i think the code we have now is acceptably performant for that anyway

12:33:06 Bike i also don't think characterizing "pre-inlining" LET as special is fair. we'd be marking where variables are created, which we probably want to do for various reasons anyway, and then letting segregate-lexicals use that information.

12:33:23 Bike it's letting segregate-lexicals consider multiple kinds of instructions as creating bindings, instead of just ENTER.

12:49:33 Bike we can talk about this when you're around, of course

13:10:47 Bike I guess it's like, we COULD compile (foo ...) for function foo as a call to FUNCALL, and then rely on general mechanisms to reduce it, but why bother? is that really a "special case" or just basic semantics?

13:27:33 beach So here is what I think. Inlining has a performance problem. It might be that we are using a quadratic algorithm where a linear one is available.

13:27:57 beach If so, we need to make sure it is true and we need to fix it.

13:28:21 beach It is entirely possible that, when this problem is fixed, things will be fast enough.

13:28:55 beach If that is the case, working on what I still consider a special case for LET will be wasted work, and will leave us with more code to maintain.

13:29:31 beach In the meantime, inlining for other situations than LET still has a performance problem.

13:30:30 Bike Okay, I don't mind looking at that first. Though I don't think I understand what quadratic performance you mean. From what I could tell, nested lets resulted in a linear amount of copying.

13:31:35 Bike and i don't really see how to avoid that.

13:31:40 beach Well, quadratic is not quite true, M*N rather, where M is the nesting depth and N the number of instructions on the average at each level.

13:31:51 Bike Oh. That. Okay.

13:31:55 Bike How would that be avoided?

13:32:07 beach By inlining outermost-in rather than innermost-out.

13:32:42 beach Well, wait.

13:32:49 beach Not M*N.

13:33:09 beach N + 2N + 3N + ... + MN.

13:33:13 beach How much is that?

13:33:36 beach M^2N, right?

13:33:44 Bike yeah.

13:33:54 beach But outermost-in will give M*N.

13:34:38 Bike I'm not sure I understand. One thing is that, as far as I can tell, if we inline a function that ENCLOSEs other functions, those other functions have to be copied as well, so that they close over the correct variables.

13:34:39 beach It is an approximation of course, because not ever level has N instructions.

13:35:11 beach I would have to think about that.

13:36:20 Bike For example if we have (let ((x ...)) (let ((y ...)) (+ x y))), inlining outer first results in like (progn (setq xprime ...) (let ((y ...)) (+ xprime y)))

13:37:12 beach Yes, maybe so.

13:37:23 beach I am having a hard time seeing it at the source level.

13:37:31 beach So what do you conclude from that?

13:38:15 Bike That going outermost in doesn't reduce the amount of copying since we still have to copy the inner parts.

13:38:23 Bike Repeatedly.

13:38:30 beach I don't see that.

13:38:42 beach Let me work it out at the HIR level and get back to you.

13:38:47 Bike Sure.

13:39:59 Bike oh, and the other thing i thought of was that right now segregate-lexicals examines all variables for being shared, but in practice only the minority of variables actually corresponding to lisp bindings need to be checked. i don't know if fixing that would help performance much, but it's a thought

13:40:46 beach I see what you mean.

13:56:45 Bike lisp bindings and also catch-instruction output, really

14:07:10 beach http://metamodular.com/example.pdf

14:07:17 beach Suppose we have this situtation.

14:07:49 beach I can't see anything wrong with obtaining this one by inlining only the middle instructions: http://metamodular.com/example2.pdf

14:08:50 beach I took some shortcuts, but I think it illustrates my thinking.

14:10:15 beach I should probably also handle the case where the middle one creates a variable that is used by the rightmost one.

14:10:39 beach I mean, make sure it is handled right.

14:11:08 beach the call2/enter2 pair should be replaced by assignments.

14:11:43 beach I guess I should work on it some more so that it is more strict.

14:13:25 Bike the problem is the local variables. like, consider if the output of enclose2 was used by the enter3 function. that output is copied into enter1, but enter3 doesn't use the copy.

14:19:29 beach Yes, I see it.

14:20:00 beach Thanks.

14:21:48 Bike i only know this through unfortunate experience

14:21:58 beach I understand.

14:23:11 beach However, we must improve inlining anyway, so I think we must use a quadratic algorithm only when it is necessary.

14:23:53 beach I will go away and think about it.

14:24:27 beach In the meantime, I would really like these statistics about how many times each instruction gets inlined for real big examples.

14:24:36 Bike We could probably skip copying some inner functions if we tie the inlining in more with the determination of what variables are closed over (which we do anyway)

14:24:52 beach That's what I am thinking.

14:25:08 Bike But for LET kind of functions where an inner body is going to use a lot of bindings, it might not help

14:25:16 Bike anyway, yeah, i'll whip up the stats business

14:26:28 drmeister Hello everyone

14:27:40 beach Hello drmeister.

14:28:59 beach Bike: In this particular example, the function defined by enter2 should disappear and the one defined by enter3 should use the variable introduced by the function defined by enter1 instead.

14:29:32 beach Let me work out the example...

14:29:56 Bike you mean, because enter3 is only used once?

14:31:04 Bike used only in enter2, and enter2 is only called once, that is

14:32:31 beach Yes.

14:32:43 beach And enter2 is unused after the inlining.

14:33:31 beach New versions at the same links: http://metamodular.com/example.pdf http://metamodular.com/example2.pdf

14:33:55 beach I think the LET case falls exactly into that category.

14:34:35 beach So we should detect that case (which will then work for situations other than the ones that happen by LET) and handle it.

14:35:05 beach Doing it that way will likely solve the LET problem AND make inlining faster in general.

14:35:26 Bike if a function is called only once we don't even need to copy it to inline

14:35:38 Bike i think.

14:35:56 beach Maybe.

14:36:32 beach I am not convinced yet. I would have to think harder about it.

14:36:35 Bike i mean, and that would probably be easier to write than modifying any enclosed functions recursively

14:37:03 beach I totally understand. But we need to be convinced that it's correct to do so.

14:37:35 beach For instance, the inlined function may have several RETURN instructions.

14:38:11 Bike at the moment that's not possible.

14:38:20 beach But it could be.

14:39:00 beach Wait, why is it not possible?

14:39:10 beach What if there are several RETURN-FROMs in the source?

14:39:13 Bike ast-to-hir only generates one return per function

14:39:19 beach I see.

14:39:25 beach Well that might change in the future.

14:39:52 Bike anyway, would it be a problem? we'd just have any return in the function being inlined turn into a local control transfer to the former return site

14:40:25 beach I am not totally sure, but it might work.

14:40:56 beach Either way, I think the big gain is going to be from avoiding the M^2 behavior in most cases.

14:43:40 beach Anyway, do you see my point that, if we treat this general case, the LET situation will likely take care of itself, and we need to treat this general case anyway at some point. Whereas if we handle the LET case specially, we will have more code, because we still need to handle the general case (if it is produced by something other than LET).

14:44:51 beach So my suggestion is: 1. Make sure quadratic behavior is a problem.

14:45:04 beach 2. Try to eliminate it where it is correct to do so.

14:45:14 beach ... but still copy the instructions.

14:45:31 beach 3. Measure again to see how much time is spent copying instructions.

14:46:04 beach If it is still a problem consider whether copying can be avoided in some cases.

14:48:09 beach Oh, and there is another interesting situation. If enter2 does not create a variable that is used by any of its descendants, then it doesn't matter whether it is called several times. It can still be inlined.

14:48:20 beach And in that situation, you do need to copy the instructions.

14:49:08 beach So that's another reason to avoid two ways of doing it, one way by copying instructions and another way by not copying them. Two ways would give more code and more maintenance.

14:50:36 beach Bike: Am I making sense here?

14:50:53 Bike Sure.

14:51:56 beach So do we have a plan?

14:52:18 beach If so, I look forward to hearing intermediate goals from you.

14:52:26 Bike Yeah I'll do the stats first.

14:52:32 beach Great!

14:57:13 beach You will need two hash tables.

14:57:36 beach One mapping initial instructions to a counter of how many times it was inlined.

14:57:51 beach Another to map copies of instructions to initial instructions.

15:00:28 Bike Something like that. Not all instructions are copied in an obvious way so I'll have to work something out.

15:01:07 beach OK.

15:01:55 Bike oh, and i did another really basic count, and found that compiling a 100-line function with various LABELS, LET*, DOLIST and such involved over fifty five thousand calls to clone-instruction

15:02:20 beach That sounds like a clue.

15:13:58 drmeister bike: When you first got the inlining to work - it was really, really slow. We ran it with dtrace profiling and it showed a problem that you were almost immediately able to fix. Do you recall what that was?

15:22:52 drmeister Bike: How many HIR nodes does that 100-line function start with before inlining?

15:25:00 drmeister Counting the number of times each progenitor HIR instruction gets cloned (with the two hash tables) will show the pattern of how inlining is applied (inside-out vs outside-in) but we can still learn a lot from the total number of clones made/number of progenitor instructions - right?

15:26:09 drmeister Because the best approach would have "total # clones"/"# progenitor" somewhere near 1.0 - right?