freenode/#clasp - IRC Chatlog

12:10:25 drmeister Hello everyone

12:56:10 beach Hello drmeister.

13:27:19 drmeister Hello

13:29:44 drmeister So - what's the value-numbering algorithm you want to use?

13:32:45 drmeister I'm looking at Muchnick's book

13:35:59 beach I was thinking of using the one that gives the best result, namely Killdal.

13:36:29 beach But I think there is a need for using a domain that takes into account only assignments.

13:37:28 beach Normally, value numbering considers the type of each instruction, so that, say, the result of two CAR-INSTRUCTION applied to the same "value number" are recognized as having the same "value number".

13:37:44 drmeister The input is a HIR graph and the output will be another HIR graph - correct?

13:37:50 beach But there is a use for one that only recognizes assignments.

13:38:38 beach Such a domain could be used to improve type inference, and to remove redundant assignments by also removing temporaries that are not needed.

13:38:59 beach No, as Bike explained, value numbering in itself does no transformations.

13:39:08 beach It only collects information.

13:40:23 drmeister Right - but then we use that information to transform the HIR graph. We can use it to carry out type inference, remove dead code and remove redundant assignments.

13:41:02 beach Yes, but the question is whether you want to.

13:41:17 drmeister Yes - I'm pretty sure I want to.

13:41:35 beach As the literature explains, it used to be the case that (say) an addition was more expensive than a memory access. That is no longer the case.

13:41:53 drmeister I want to aggressively reduce the size of the HIR early on.

13:42:17 beach So if you remove redundant additions by keeping one more variable (the result) alive, then you will most certainly have a pessimization rather than an optimization.

13:44:13 beach If that is what you want to do, then you should go full value numbering and save the result of every operation; not just of assignments. And you may have slower code because of more memory accesses at the end.

13:44:52 beach You should also definitely not do any inlining.

13:45:32 beach Instead, you should break up your functions when possible, like when an inner function does not use a variable of an enclosing function, you should submit it separately to LLVM.

13:45:35 beach Not as a block.

13:47:20 beach And you can safely ignore our "path replication" paper, because that makes the code bigger as well.

13:48:51 drmeister There's got to be some middle ground here?

13:49:09 drmeister Cleavir generates 218 lexical locations when ecl generates 8.

13:50:06 drmeister Is Cleavir going to be the Common Lisp equivalent of the Stalin compiler?

13:51:19 beach I don't even know what that means.

13:52:11 beach My vision of Cleavir is that it should offer all major optimizations that exist in the literature, and it should let the client choose which ones to use, when to use them, in which order, how often, etc.

13:52:47 beach It should let the client put together one or more compilers to use in different situations by combining these optimizations.

13:54:08 drmeister Yes - so we have a need for an early pass that aggressively reduces the amount of code. We can always switch off that pass if in the future.

13:54:10 beach If the client thinks some additional transformations are required in their particular case, those transformations should preferably fit in seamlessly so that the other tools in the toolbox can still be used.

13:54:59 drmeister Absolutely. So an early HIR->HIR transformation that reduces the amount of code is one way to do this.

13:55:08 Bike i don't think there is much point in discussing this endlessly. let me try an ast-to-hir rewrite or doing SSA or SOMETHING

13:55:12 Bike as is this is too abstract.

13:55:58 drmeister Well, I was trying to push the conversation a bit more forward to help decide how to implement this pass in a way that it would produce the most shareable code.

13:56:43 Bike SSA already exists in cleavir. value numbering is a suseful thing to do in any circumstance

13:57:27 beach I am totally in favor of adding pretty much any such analysis and transformation to the toolbox.

13:57:57 drmeister There are different value number algorithms - right - so we could agree on a way of representing the value numbering info so it would be more generally useful.

13:58:01 drmeister Or am I off base there?

13:58:08 beach I am even in favor of including several algorithms for something like value numbering. They have different characteristics, and different clients may need different characteristics.

13:58:30 beach That might be possible.

13:58:36 Bike the representation of value numbering is to assign a value number to each lexical location. it's pretty neutral that way.

13:59:06 beach Bike: Correction: to each pair of lexical location and program point.

13:59:12 Bike right, yes.

14:02:06 beach Where I get more squeamish is with suggestions of changing the main structure of the core of Cleavir, like how ASTs are created and how HIR is created from ASTs.

14:02:26 beach Because such changes may have an impact on modularity and maintainability.

14:02:39 Bike ast-to-hir is pretty self contained.

14:02:56 beach Whereas individual transformations can be omitted if they don't please the client.

14:02:57 Bike nothing i'm thinking of would require more dependencies.

14:03:21 beach I thought we just discussed changing HIR creation to generate fewer temporaries.

14:03:33 beach That is such a change to the main structure of the core.

14:03:47 Bike yes, but only to ast-to-hir, which is self contained because it's nice and modular.

14:03:50 beach I thought we also just discussed completely reversing the order of HIR generation.

14:03:57 drmeister Good - so we won't get to anxious about what value numbering algorithms we implement.

14:03:58 beach That's another fundamental change.

14:04:12 Bike but none of the other passes care.

14:04:24 beach OK, let me say this again...

14:04:37 beach If there are changes to the core, then those changes have to be maintained.

14:04:49 beach Ultimately, I am the one who is going to have to maintain them.

14:05:12 beach I have been trying very hard to structure the core so that it is maintainable.

14:05:22 Bike i know. i don't want to reduce maintainability.

14:05:30 Bike i don't understand why you're so pessimistic about this.

14:05:47 beach OK.

14:06:13 Bike we're moving from generate-ast to cst-to-ast because that's how we want to go. that's a whole rewrite of a major part of the core. because things are modular, later phases have been largely unaffected.

14:06:20 Bike that's the kind of rewrite i have in mind.

14:06:39 drmeister But it sounds like beach (and we in the long term) want all these temporaries in there.

14:07:03 Bike not really.

14:07:04 beach Bike: OK go ahead and give it a try.

14:07:25 Bike some of them are certainly useless, like the lexical variable temporaries. if we can find a way to not generate those that doesn't reduce maintainability, there is no reason to keep them.

14:07:27 beach Oh, at this point I don't care a bit about those temporaries.

14:07:38 drmeister But if anyone is steeped in the cleavir ethos - it's Bike - he isn't going to reduce maintainability at the expense of speed.

14:07:41 beach I know for a fact that any reasonable register algorithm will get rid of them.

14:08:20 beach And I also know that the most common register allocation algorithm might generate faster code when there are more temporaries.

14:08:35 drmeister We know that register algorithms get rid of them - but that's too late for type inference.

14:08:57 beach Type inference does not need to get rid of temporaries.

14:09:02 beach We discussed this already.

14:09:05 drmeister I know that now.

14:09:38 beach Type inference just needs the result of some very simplified value numbering algorithm.

14:09:50 beach One that takes only assignments into account.

14:10:11 drmeister Honestly, I don't know what the impact of all these extra assignment instructions are on inlining and everything.

14:10:38 drmeister We could also try leaving everything as it is and have a pass to remove useless assignments just before we lower to llvm-ir.

14:11:10 drmeister And use value numbering to inform type inference.

14:11:50 Bike value numbering has been the plan for months. i haven't worked on it for various reasons, like the inlining stuff i'm still working on.

14:13:30 drmeister Bike: If inlining is working well, so that we aren't repeatedly copying instructions - then a factor of 2 in the number of instructions should only impact inlining time by a factor of 2 - correct?

14:13:49 Bike no, because inlining involves more than copying.

14:14:12 Bike in fact according to the profiling data, the actual copying is almost negligible in my latest code.

14:14:13 drmeister Bike: I'm not criticizing your work - the inlining code is difficult to debug - I see that.

14:14:54 Bike currently the long steps are in build-function-dag and discern-trappers, which are both basically analysis of the graph.

14:15:07 drmeister Ok - are there non-linearities in the inlining wrt the number of instructions?

14:15:35 beach The other thing that I have been hinting is that we need a policy for WHEN to inline.

14:15:43 Bike and compute-destinies, too

14:16:05 Bike i'm not sure what their performance relative to graph metrics is

14:16:13 drmeister Yeah - a policy of when to inline would be good.

14:16:25 Bike they're both certainly affected by the existence of redundant assignments, but i don't know how much

14:17:22 drmeister So - say we implemented a pass to eliminate redundant assignments to some degree that is informed by value numbering - then we could see the impact of the number of instructions on inlining.

14:17:32 beach When the code of the body of the callee is large, then the overhead of the call is very likely negligible compared to the time to execute the body. In those cases, inlining will make the code bigger without any noticeable performance improvement.

14:18:11 drmeister Can we decide to just inline small functions?

14:18:13 Bike inlining doesn't just mean eliding the time to call though...

14:18:31 Bike i don't think policy is a factor at the moment since pretty much the only things being inlined are the definitions i have for cleavir that are for the sake of analysis.

14:19:23 beach So CAR is not inlined?

14:19:43 drmeister That is ast inlined - is it not?

14:20:22 Bike What? car is inlined. that's what i meant.

14:20:29 beach If I am right, functions marked as inline get copied into the AST of the caller, as if they were defined by FLET.

14:20:35 Bike yes.

14:21:05 beach How is CAR one of the "definitions i have for cleavir that are for the sake of analysis"?

14:21:23 Bike It expands into primops.

14:21:27 beach drmeister: Whether the code of that FLET is then inlined is a different story.

14:22:36 beach drmeister: That is why a policy is so important.

14:22:38 Bike if we leave an flet in it's also going to make the code bigger, by the way. it might be good to have a policy even at ast level.

14:23:21 beach Bike: That is what I meant when I said that an inner function that does not refer to a variable of an enclosing function could be submitted independently to LLVM.

14:24:01 Bike i think it is independent enough for llvm's purposes.

14:24:02 beach You would essentially replace (flet ((bla ....)) (bla ..)) with (defun bla ...)

14:24:29 beach OK, so you are convinced that such FLETs do not contribute to the size of the code submitted to LLVM?

14:24:36 beach I have no idea myself.

14:24:52 Bike all i meant was that if we keep flets, we're going to have hundreds of copies of CAR or whatnot.

14:25:03 Bike i don't know if that's bad for the compiler per se, but it seems a bit silly.

14:25:45 beach Obviously, if they get copied in, then that means that they are likely candidates for inlining.

14:26:11 beach But this is another orthogonality issue.

14:26:27 beach Let say a function F is marked as inline and is turned into an FLET.

14:26:43 beach That function contains a huge number of calls to CAR, CDR, you name it.

14:26:59 beach Once you expand those CARs and CDRs, the body of F might be big.

14:27:10 beach Then it is no longer worthwhile inlining it.

14:27:55 beach So you should then turn it into a DEFUN, which might be an adaptation to this particular enclosing function.

14:28:14 beach For example, if one of the arguments given to F is a constant.

14:28:32 beach If you do it any other way, you lose orthogonality.

14:30:10 drmeister What would a policy for inlining look like?

14:30:41 beach The literature is full of policies for inlining, as our paper correctly states.

14:31:57 drmeister Bike: Right now - what is the policy for inlining?

14:32:07 drmeister Here:

14:32:09 drmeister https://www.irccloud.com/pastebin/jRc6L19U/

14:32:18 beach drmeister: As I recall, "always inline everything".

14:32:38 drmeister This generates 14 named-enter instructions.

14:32:48 beach As I have said already, one part of such a policy might be "inline only if the code in the body is small."

14:33:08 Bike yes, it always inlines everything.

14:35:59 beach The next question, then, is how to measure the size of the body code.

14:36:20 beach I would definitely exclude calls to error functions from the size metric.

14:36:24 drmeister So - let's add a policy. The way inlining is currently done though does not seem to impact llvm time as much as it does cleavir time.

14:36:59 beach What leads you to say that?

14:37:28 drmeister We measure the time spent in cleavir and the time spent in llvm.

14:38:13 drmeister Currently we have some files where >95% of the time is spent in cleavir.

14:38:47 Bike pretty much all the function bodies we're inlining are the small standard ones.

14:40:02 drmeister This is why I suspect that inlining isn't causing the number of instructions to explode so much as we have a lot of instructions to begin with.

14:41:06 drmeister Let's keep a running count of the number of cleavir instructions that are being lowered to llvm-ir and turn on and turn off inlining as we currently do it.

14:43:02 Bike obviously with inlining there will be more. it's just a question of how many more.

14:43:18 Bike assuming you mean like ast-level inlining, i guess

14:43:25 drmeister What does the inlining inline - like for the code above:

14:43:28 Bike if we keep flets around we compiling everything anyway

14:43:31 Bike compile

14:43:35 drmeister https://usercontent.irccloud-cdn.com/file/gn3A0eZ1/foo.pdf

14:44:20 drmeister Every named-enter instruction that I look at is attached to a single enclose instruction.

14:44:46 drmeister So if you inline everything - it won't change the total number of instructions - will it?

14:45:51 beach Not significantly in this case I would think.

14:45:54 Bike not if we have inline asts for everything.

14:46:04 Bike number of instructions in one function will change.

14:46:51 drmeister What am I looking for? Each named-enter will be connected to one enclose-instruction - then any subsequent funcall-instructions are where the inlining happens - correct?

14:47:07 Bike yes.

14:48:29 drmeister So - if CAR declared inline and there are 20 calls to CAR - there will be 20 copies of the CAR code in the HIR?

14:48:31 drmeister Checking...

14:48:53 Bike mm, not sure. might have to memoize that in cst-to-ast.

14:50:10 drmeister (defparameter *h* (clasp-cleavir::draw-form-cst-hir '(lambda (x) (car x) (car x) (car x) (car x))))

14:50:23 drmeister https://usercontent.irccloud-cdn.com/file/0GksPIx0/foo.pdf

14:50:57 drmeister Four invocations of CAR, four copies of CAR's HIR

14:52:36 drmeister Isn't this a big problem? We spend all this time generating those four HIR subgraphs, then you have to analyze them all and then you have to copy them all.

14:52:42 drmeister That seems like a lot of churn.

14:53:05 beach If you inline them, they get copied anyway.

14:55:14 Bike it probably is a problem though.

14:55:33 Bike more functions to analyze and stuff. i'll think about it.

14:56:38 drmeister Bike: Right - you said "currently the long steps are in build-function-dag and discern-trappers, which are both basically analysis of the graph."

14:58:49 drmeister Are we repeating analysis of the same code over and over?

14:59:02 drmeister Within one graph and from one top-level form to the next.

14:59:19 drmeister What is the analysis?

15:00:31 Bike it determines whether the environments of functions escape, and yes, it's done very redundantly.

15:02:00 beach Sounds like a perfect illustration of what Ousterhout was talking about.

15:03:00 beach "measure one level deeper" he says.

15:03:51 beach If you only profile, you would come to the conclusion that the analysis code should be improved.

15:07:36 drmeister So if CAR is inlined - (defun foo (x) (car x)) becomes (defun foo (x) (flet ((inlined-car (z) ...)) (inlined-car x)))

15:08:16 drmeister And then we do an analysis to determine if inlined-car captures the environment and escapes?

15:08:59 beach Please, can we use inlining to mean the process of eliminating the call.

15:09:15 beach Let's use something else for incorporating the ASTs.

15:09:59 beach How about "incorporated".

15:10:13 drmeister I don't follow you.

15:10:36 drmeister The AST incorporation has led to an enclose-instruction and a funcall-instruction.

15:10:40 beach There are two steps. Step one is to "incorporate" the AST of the callee into the caller.

15:11:11 beach The second step may or may not happen. It eliminates the call in favor of copies of the instructions in the body of the callee.

15:11:18 beach The second step is "inlining".

15:11:20 beach The first one is not.

15:11:59 drmeister I'm saying that it used to be that the ast of the callee was incorporated into the caller and no enclose-instruction/funcall-instruction was generated - or am I wrong about that?

15:12:46 beach No, you said "So if CAR is inlined - (defun foo (x) (car x)) becomes (defun foo (x) (flet ((inlined-car (z) ...)) (inlined-car x))).

15:13:06 beach I don't call that "inlining".

15:13:41 beach So we need to come up with a different name for it.

15:14:12 beach Otherwise, we are going to have to explain which one we mean every time.

15:14:57 beach How about "localized", to avoid confusion with "incorporated"?

15:15:18 beach After all, it is turned from a global function to a local function.

15:16:02 drmeister I'll try to be more specific - with the cst compiler: (declaim (inline car)) (defun car (x) ...) (defun foo (x) (car x)) effectively becomes (defun foo (x) (flet ((inlined-car (z) ...)) (inlined-car x)))

15:16:41 beach Correct. And let's give that a name. I suggest "localized" and "localization".

15:16:43 drmeister In the old ast compiler there is no enclose-instruction and funcall-instruction generated in the HIR.

15:16:49 beach Correct.

15:17:20 drmeister So we are doing an analysis in the cst compiler that really doesn't need to be done aren't we?

15:17:21 beach Because we now want to give the inlining code the option of inlining or not.

15:17:51 beach The capture analysis? Yes, it appears that way.

15:18:00 drmeister Right - the capture analysis.

15:18:28 drmeister The enclose-instruction will not enclose anything in the cst compiler - right?

15:18:36 beach That analysis could be done once and for all.

15:19:39 beach Sure, but that is not information enough to determine whether it can be inlined.

15:20:55 drmeister You are now talking about the inlining policy? To determine whether it can be inlined we check an inlining policy?

15:21:07 beach Yes.

15:21:41 drmeister Ok.

15:21:50 beach You are right, the analyses we do could probably be done once and for all.

15:22:14 beach Like whether the environment is captured.

15:22:48 drmeister Bike: Does this give you anything actionable?

15:22:50 beach I would be perfectly willing to stick that kind of information in the ENTER instruction.

15:23:08 beach Er, the ENTER-AST I mean.

15:23:22 beach No wait, we don't know that at the AST level.

15:23:31 beach Something more clever needs to be done then.

15:23:57 drmeister I'm curious to hear what Bike has to say about it - what information he needs and where it needs to be to limit the amount of analysis.

15:24:32 beach I agree.

15:25:09 drmeister He may have stepped out - we can table this for now.

15:25:19 beach Sure.

15:25:27 drmeister Oh - he did step out.

15:25:38 beach It is probably lunch time.

15:26:36 drmeister And I just had a cat perch on my chest.

15:28:41 drmeister One more question - you use "localized" or "localization" - can we just say "incorporate the ast"? I understand that.

15:30:00 drmeister Why does the cst compiler now generate an enclose-instruction/funcall-instruction for a function declared inline with (declaim (inline ...))?

15:30:29 drmeister Was it so that we can defer the decision to inline to later on when we could have a better idea of the number of instructions that will be inlined?

15:31:41 drmeister Or was what we were doing before (incorporating the ast and generating more HIR within the inlinee) wrong in some way?

15:32:10 drmeister I'm looking at this...

15:32:26 drmeister (lambda (x) (car x) (car x))

15:32:35 drmeister With the AST compiler the HIR looks like this...

15:33:49 drmeister https://usercontent.irccloud-cdn.com/file/fh8XOj95/foo.pdf

15:34:47 beach Correct, it was so that at the HIR level, we can have a better idea of the context, so that we can decide to inline, or not.

15:35:12 drmeister And the cst compiler generates this https://usercontent.irccloud-cdn.com/file/5lcygyY8/foo-cst.pdf

15:35:32 drmeister Ok.

15:37:17 beach I avoided "incorporation" because you just used it in the sense that there was no ENCLOSE.

15:37:22 beach But I don't care what we use.

15:37:48 drmeister Do you know what the analysis is that Bike is doing as part of the inlining? Was it part of your paper on partial inlining?

15:38:32 beach I don't know the details. But one thing that he needs is to know whether there is any capture of environment inside the callee.

15:38:33 drmeister I'll be more specific going forward - I'm not sure how - but I'll think harder on it.

15:38:45 drmeister It's easier to describe with actual hir graphs.

15:38:58 beach The other thing he needs to know is whether there is a chance that the callee may be invoked more than once for a single invocation of the caller.

15:39:21 drmeister And at this point that means if there are lexical locations shared between the caller and the callee?

15:42:01 beach No, the latter means that if the callee is recursive, there will be more than one simultaneous invocation of the callee for a single invocation of the caller.

15:43:19 beach That's the basic information that is needed.

15:43:34 beach Environment capture is just a clue that it may happen.

15:44:50 beach Sorry, correction, the basic information is whether there can be more than one simultaneously active copy of the callee environment for a single active copy of the caller environment.

15:47:15 drmeister Ok.

15:48:49 drmeister But this analysis could be done once and for all baring function redefinition.

15:55:29 drmeister Here's my attempt at a example of an inline recursive function.

15:55:43 drmeister https://www.irccloud.com/pastebin/U8dfgd9H/

15:56:22 drmeister https://usercontent.irccloud-cdn.com/file/xMXbDe83/foo.pdf

16:10:43 beach I can't really follow that Graphviz drawing.

16:11:46 beach At the moment, I don't think we recognize functions as being recursive.

16:12:06 beach We merely generate a call to a named function instead of recursing.

16:12:14 drmeister The recursive call to foo is done with an fdefinition/funcall

16:12:18 beach But when we do recognize that, the AST will have a loop in it.

16:13:10 beach And then, even though it is "incorporated" or "localized" into the caller, it won't be possible to inline.

16:13:47 beach Currently, that situation can only happen with LABELS, or some Y combinator trickery.

18:41:45 karlosz_ ** NICK karlosz