freenode/#clasp - IRC Chatlog
Search
13:36:29
beach
But I think there is a need for using a domain that takes into account only assignments.
13:37:28
beach
Normally, value numbering considers the type of each instruction, so that, say, the result of two CAR-INSTRUCTION applied to the same "value number" are recognized as having the same "value number".
13:38:38
beach
Such a domain could be used to improve type inference, and to remove redundant assignments by also removing temporaries that are not needed.
13:40:23
drmeister
Right - but then we use that information to transform the HIR graph. We can use it to carry out type inference, remove dead code and remove redundant assignments.
13:41:35
beach
As the literature explains, it used to be the case that (say) an addition was more expensive than a memory access. That is no longer the case.
13:42:17
beach
So if you remove redundant additions by keeping one more variable (the result) alive, then you will most certainly have a pessimization rather than an optimization.
13:44:13
beach
If that is what you want to do, then you should go full value numbering and save the result of every operation; not just of assignments. And you may have slower code because of more memory accesses at the end.
13:45:32
beach
Instead, you should break up your functions when possible, like when an inner function does not use a variable of an enclosing function, you should submit it separately to LLVM.
13:47:20
beach
And you can safely ignore our "path replication" paper, because that makes the code bigger as well.
13:52:11
beach
My vision of Cleavir is that it should offer all major optimizations that exist in the literature, and it should let the client choose which ones to use, when to use them, in which order, how often, etc.
13:52:47
beach
It should let the client put together one or more compilers to use in different situations by combining these optimizations.
13:54:08
drmeister
Yes - so we have a need for an early pass that aggressively reduces the amount of code. We can always switch off that pass if in the future.
13:54:10
beach
If the client thinks some additional transformations are required in their particular case, those transformations should preferably fit in seamlessly so that the other tools in the toolbox can still be used.
13:54:59
drmeister
Absolutely. So an early HIR->HIR transformation that reduces the amount of code is one way to do this.
13:55:08
Bike
i don't think there is much point in discussing this endlessly. let me try an ast-to-hir rewrite or doing SSA or SOMETHING
13:55:58
drmeister
Well, I was trying to push the conversation a bit more forward to help decide how to implement this pass in a way that it would produce the most shareable code.
13:56:43
Bike
SSA already exists in cleavir. value numbering is a suseful thing to do in any circumstance
13:57:27
beach
I am totally in favor of adding pretty much any such analysis and transformation to the toolbox.
13:57:57
drmeister
There are different value number algorithms - right - so we could agree on a way of representing the value numbering info so it would be more generally useful.
13:58:08
beach
I am even in favor of including several algorithms for something like value numbering. They have different characteristics, and different clients may need different characteristics.
13:58:36
Bike
the representation of value numbering is to assign a value number to each lexical location. it's pretty neutral that way.
14:02:06
beach
Where I get more squeamish is with suggestions of changing the main structure of the core of Cleavir, like how ASTs are created and how HIR is created from ASTs.
14:03:57
drmeister
Good - so we won't get to anxious about what value numbering algorithms we implement.
14:06:13
Bike
we're moving from generate-ast to cst-to-ast because that's how we want to go. that's a whole rewrite of a major part of the core. because things are modular, later phases have been largely unaffected.
14:06:39
drmeister
But it sounds like beach (and we in the long term) want all these temporaries in there.
14:07:25
Bike
some of them are certainly useless, like the lexical variable temporaries. if we can find a way to not generate those that doesn't reduce maintainability, there is no reason to keep them.
14:07:38
drmeister
But if anyone is steeped in the cleavir ethos - it's Bike - he isn't going to reduce maintainability at the expense of speed.
14:08:20
beach
And I also know that the most common register allocation algorithm might generate faster code when there are more temporaries.
14:08:35
drmeister
We know that register algorithms get rid of them - but that's too late for type inference.
14:09:38
beach
Type inference just needs the result of some very simplified value numbering algorithm.
14:10:11
drmeister
Honestly, I don't know what the impact of all these extra assignment instructions are on inlining and everything.
14:10:38
drmeister
We could also try leaving everything as it is and have a pass to remove useless assignments just before we lower to llvm-ir.
14:11:50
Bike
value numbering has been the plan for months. i haven't worked on it for various reasons, like the inlining stuff i'm still working on.
14:13:30
drmeister
Bike: If inlining is working well, so that we aren't repeatedly copying instructions - then a factor of 2 in the number of instructions should only impact inlining time by a factor of 2 - correct?
14:14:12
Bike
in fact according to the profiling data, the actual copying is almost negligible in my latest code.
14:14:13
drmeister
Bike: I'm not criticizing your work - the inlining code is difficult to debug - I see that.
14:14:54
Bike
currently the long steps are in build-function-dag and discern-trappers, which are both basically analysis of the graph.
14:15:35
beach
The other thing that I have been hinting is that we need a policy for WHEN to inline.
14:16:25
Bike
they're both certainly affected by the existence of redundant assignments, but i don't know how much
14:17:22
drmeister
So - say we implemented a pass to eliminate redundant assignments to some degree that is informed by value numbering - then we could see the impact of the number of instructions on inlining.
14:17:32
beach
When the code of the body of the callee is large, then the overhead of the call is very likely negligible compared to the time to execute the body. In those cases, inlining will make the code bigger without any noticeable performance improvement.
14:18:31
Bike
i don't think policy is a factor at the moment since pretty much the only things being inlined are the definitions i have for cleavir that are for the sake of analysis.
14:20:29
beach
If I am right, functions marked as inline get copied into the AST of the caller, as if they were defined by FLET.
14:21:05
beach
How is CAR one of the "definitions i have for cleavir that are for the sake of analysis"?
14:22:38
Bike
if we leave an flet in it's also going to make the code bigger, by the way. it might be good to have a policy even at ast level.
14:23:21
beach
Bike: That is what I meant when I said that an inner function that does not refer to a variable of an enclosing function could be submitted independently to LLVM.
14:24:29
beach
OK, so you are convinced that such FLETs do not contribute to the size of the code submitted to LLVM?
14:24:52
Bike
all i meant was that if we keep flets, we're going to have hundreds of copies of CAR or whatnot.
14:25:45
beach
Obviously, if they get copied in, then that means that they are likely candidates for inlining.
14:27:55
beach
So you should then turn it into a DEFUN, which might be an adaptation to this particular enclosing function.
14:32:48
beach
As I have said already, one part of such a policy might be "inline only if the code in the body is small."
14:36:24
drmeister
So - let's add a policy. The way inlining is currently done though does not seem to impact llvm time as much as it does cleavir time.
14:40:02
drmeister
This is why I suspect that inlining isn't causing the number of instructions to explode so much as we have a lot of instructions to begin with.
14:41:06
drmeister
Let's keep a running count of the number of cleavir instructions that are being lowered to llvm-ir and turn on and turn off inlining as we currently do it.
14:44:20
drmeister
Every named-enter instruction that I look at is attached to a single enclose instruction.
14:44:46
drmeister
So if you inline everything - it won't change the total number of instructions - will it?
14:46:51
drmeister
What am I looking for? Each named-enter will be connected to one enclose-instruction - then any subsequent funcall-instructions are where the inlining happens - correct?
14:48:29
drmeister
So - if CAR declared inline and there are 20 calls to CAR - there will be 20 copies of the CAR code in the HIR?
14:50:10
drmeister
(defparameter *h* (clasp-cleavir::draw-form-cst-hir '(lambda (x) (car x) (car x) (car x) (car x))))
14:52:36
drmeister
Isn't this a big problem? We spend all this time generating those four HIR subgraphs, then you have to analyze them all and then you have to copy them all.
14:56:38
drmeister
Bike: Right - you said "currently the long steps are in build-function-dag and discern-trappers, which are both basically analysis of the graph."
15:00:31
Bike
it determines whether the environments of functions escape, and yes, it's done very redundantly.
15:03:51
beach
If you only profile, you would come to the conclusion that the analysis code should be improved.
15:07:36
drmeister
So if CAR is inlined - (defun foo (x) (car x)) becomes (defun foo (x) (flet ((inlined-car (z) ...)) (inlined-car x)))
15:08:16
drmeister
And then we do an analysis to determine if inlined-car captures the environment and escapes?
15:10:36
drmeister
The AST incorporation has led to an enclose-instruction and a funcall-instruction.
15:10:40
beach
There are two steps. Step one is to "incorporate" the AST of the callee into the caller.
15:11:11
beach
The second step may or may not happen. It eliminates the call in favor of copies of the instructions in the body of the callee.
15:11:59
drmeister
I'm saying that it used to be that the ast of the callee was incorporated into the caller and no enclose-instruction/funcall-instruction was generated - or am I wrong about that?
15:12:46
beach
No, you said "So if CAR is inlined - (defun foo (x) (car x)) becomes (defun foo (x) (flet ((inlined-car (z) ...)) (inlined-car x))).
15:16:02
drmeister
I'll try to be more specific - with the cst compiler: (declaim (inline car)) (defun car (x) ...) (defun foo (x) (car x)) effectively becomes (defun foo (x) (flet ((inlined-car (z) ...)) (inlined-car x)))
15:16:43
drmeister
In the old ast compiler there is no enclose-instruction and funcall-instruction generated in the HIR.
15:17:20
drmeister
So we are doing an analysis in the cst compiler that really doesn't need to be done aren't we?
15:20:55
drmeister
You are now talking about the inlining policy? To determine whether it can be inlined we check an inlining policy?
15:22:50
beach
I would be perfectly willing to stick that kind of information in the ENTER instruction.
15:23:57
drmeister
I'm curious to hear what Bike has to say about it - what information he needs and where it needs to be to limit the amount of analysis.
15:28:41
drmeister
One more question - you use "localized" or "localization" - can we just say "incorporate the ast"? I understand that.
15:30:00
drmeister
Why does the cst compiler now generate an enclose-instruction/funcall-instruction for a function declared inline with (declaim (inline ...))?
15:30:29
drmeister
Was it so that we can defer the decision to inline to later on when we could have a better idea of the number of instructions that will be inlined?
15:31:41
drmeister
Or was what we were doing before (incorporating the ast and generating more HIR within the inlinee) wrong in some way?
15:34:47
beach
Correct, it was so that at the HIR level, we can have a better idea of the context, so that we can decide to inline, or not.
15:35:12
drmeister
And the cst compiler generates this https://usercontent.irccloud-cdn.com/file/5lcygyY8/foo-cst.pdf
15:37:17
beach
I avoided "incorporation" because you just used it in the sense that there was no ENCLOSE.
15:37:48
drmeister
Do you know what the analysis is that Bike is doing as part of the inlining? Was it part of your paper on partial inlining?
15:38:32
beach
I don't know the details. But one thing that he needs is to know whether there is any capture of environment inside the callee.
15:38:33
drmeister
I'll be more specific going forward - I'm not sure how - but I'll think harder on it.
15:38:58
beach
The other thing he needs to know is whether there is a chance that the callee may be invoked more than once for a single invocation of the caller.
15:39:21
drmeister
And at this point that means if there are lexical locations shared between the caller and the callee?
15:42:01
beach
No, the latter means that if the callee is recursive, there will be more than one simultaneous invocation of the callee for a single invocation of the caller.
15:44:50
beach
Sorry, correction, the basic information is whether there can be more than one simultaneously active copy of the callee environment for a single active copy of the caller environment.
16:13:10
beach
And then, even though it is "incorporated" or "localized" into the caller, it won't be possible to inline.
16:13:47
beach
Currently, that situation can only happen with LABELS, or some Y combinator trickery.