freenode/#clasp - IRC Chatlog
Search
12:27:40
Bike
well, first off inlining is certainly a bottleneck. there's no question. cst-to-ast results in major, major slowdown.
12:28:28
Bike
but i believe the slowdown is pretty much entirely due to LET. LET functions have unique characteristics, like having deep nesting (causing the multiple copies)
12:29:28
Bike
if we're inlining global functions, we can skip a lot of the analysis partial inlining does now- that's a major inefficiency- since we can just determine things ahead of time
12:30:24
Bike
we only need the full analysis for the comparatively rarer cases of local and anonymous functions, and i think the code we have now is acceptably performant for that anyway
12:33:06
Bike
i also don't think characterizing "pre-inlining" LET as special is fair. we'd be marking where variables are created, which we probably want to do for various reasons anyway, and then letting segregate-lexicals use that information.
12:33:23
Bike
it's letting segregate-lexicals consider multiple kinds of instructions as creating bindings, instead of just ENTER.
13:10:47
Bike
I guess it's like, we COULD compile (foo ...) for function foo as a call to FUNCALL, and then rely on general mechanisms to reduce it, but why bother? is that really a "special case" or just basic semantics?
13:27:33
beach
So here is what I think. Inlining has a performance problem. It might be that we are using a quadratic algorithm where a linear one is available.
13:28:21
beach
It is entirely possible that, when this problem is fixed, things will be fast enough.
13:28:55
beach
If that is the case, working on what I still consider a special case for LET will be wasted work, and will leave us with more code to maintain.
13:29:31
beach
In the meantime, inlining for other situations than LET still has a performance problem.
13:30:30
Bike
Okay, I don't mind looking at that first. Though I don't think I understand what quadratic performance you mean. From what I could tell, nested lets resulted in a linear amount of copying.
13:31:40
beach
Well, quadratic is not quite true, M*N rather, where M is the nesting depth and N the number of instructions on the average at each level.
13:34:38
Bike
I'm not sure I understand. One thing is that, as far as I can tell, if we inline a function that ENCLOSEs other functions, those other functions have to be copied as well, so that they close over the correct variables.
13:36:20
Bike
For example if we have (let ((x ...)) (let ((y ...)) (+ x y))), inlining outer first results in like (progn (setq xprime ...) (let ((y ...)) (+ xprime y)))
13:38:15
Bike
That going outermost in doesn't reduce the amount of copying since we still have to copy the inner parts.
13:39:59
Bike
oh, and the other thing i thought of was that right now segregate-lexicals examines all variables for being shared, but in practice only the minority of variables actually corresponding to lisp bindings need to be checked. i don't know if fixing that would help performance much, but it's a thought
14:07:49
beach
I can't see anything wrong with obtaining this one by inlining only the middle instructions: http://metamodular.com/example2.pdf
14:10:15
beach
I should probably also handle the case where the middle one creates a variable that is used by the rightmost one.
14:13:25
Bike
the problem is the local variables. like, consider if the output of enclose2 was used by the enter3 function. that output is copied into enter1, but enter3 doesn't use the copy.
14:23:11
beach
However, we must improve inlining anyway, so I think we must use a quadratic algorithm only when it is necessary.
14:24:27
beach
In the meantime, I would really like these statistics about how many times each instruction gets inlined for real big examples.
14:24:36
Bike
We could probably skip copying some inner functions if we tie the inlining in more with the determination of what variables are closed over (which we do anyway)
14:25:08
Bike
But for LET kind of functions where an inner body is going to use a lot of bindings, it might not help
14:28:59
beach
Bike: In this particular example, the function defined by enter2 should disappear and the one defined by enter3 should use the variable introduced by the function defined by enter1 instead.
14:33:31
beach
New versions at the same links: http://metamodular.com/example.pdf http://metamodular.com/example2.pdf
14:34:35
beach
So we should detect that case (which will then work for situations other than the ones that happen by LET) and handle it.
14:35:05
beach
Doing it that way will likely solve the LET problem AND make inlining faster in general.
14:36:35
Bike
i mean, and that would probably be easier to write than modifying any enclosed functions recursively
14:39:52
Bike
anyway, would it be a problem? we'd just have any return in the function being inlined turn into a local control transfer to the former return site
14:40:56
beach
Either way, I think the big gain is going to be from avoiding the M^2 behavior in most cases.
14:43:40
beach
Anyway, do you see my point that, if we treat this general case, the LET situation will likely take care of itself, and we need to treat this general case anyway at some point. Whereas if we handle the LET case specially, we will have more code, because we still need to handle the general case (if it is produced by something other than LET).
14:48:09
beach
Oh, and there is another interesting situation. If enter2 does not create a variable that is used by any of its descendants, then it doesn't matter whether it is called several times. It can still be inlined.
14:49:08
beach
So that's another reason to avoid two ways of doing it, one way by copying instructions and another way by not copying them. Two ways would give more code and more maintenance.
15:00:28
Bike
Something like that. Not all instructions are copied in an obvious way so I'll have to work something out.
15:01:55
Bike
oh, and i did another really basic count, and found that compiling a 100-line function with various LABELS, LET*, DOLIST and such involved over fifty five thousand calls to clone-instruction
15:13:58
drmeister
bike: When you first got the inlining to work - it was really, really slow. We ran it with dtrace profiling and it showed a problem that you were almost immediately able to fix. Do you recall what that was?
15:25:00
drmeister
Counting the number of times each progenitor HIR instruction gets cloned (with the two hash tables) will show the pattern of how inlining is applied (inside-out vs outside-in) but we can still learn a lot from the total number of clones made/number of progenitor instructions - right?
15:26:09
drmeister
Because the best approach would have "total # clones"/"# progenitor" somewhere near 1.0 - right?
15:29:03
drmeister
I woke up this morning thinking it would be neat to generate a HIR graph of the progenitor instructions where next to each progenitor instruction with the number of times each progenitor was cloned in the final result was displayed.
15:30:20
drmeister
beach: I was further thinking that your code for rendering HIR graphs might benefit from accepting a hash-table of instructions mapped to additional info to add to each instruction label. Then we could annotate the HIR graphs with all sorts of useful information.
15:32:14
beach
Uh oh. House guests and (admittedly small) family are back from tourism. I need to go. I might be back soon.
15:34:13
drmeister
I'm really interested in getting it working - or getting something like it working in the jupyterlab interface.
15:36:23
Bike
"No applicable method for CONCRETE-SYNTAX-TREE:CONSP with arguments of types STANDARD-GENERIC-FUNCTION." that's a new one
16:03:50
shiho
drmeister: I got the error "Condition of type: FILE-ERROR Filesystem error with pathname "quicklisp:setup.lisp"." Do I need to pull new quicklisp?
16:20:05
drmeister
Ah - I made some changes to where cando looks for the quicklisp directory - your old machine doesn't have /opt/clasp -- either you should move completely to the laptop or we have to set up the iMac to work with the new development environment
16:23:12
Bike
should be pretty easy to set up the logical pathname host to point at ~/quicklisp, right?
16:25:03
drmeister
shiho: Can you evaluate (probe-file "quicklisp:") on the iMac and tell me what it says?
16:32:00
drmeister
Ok - next I'll tell you that the development on the new machine is broken. Yay! Welcome to my shitty Python world.
17:09:32
drmeister
https://s3.us-east-2.amazonaws.com/clasp-cando/deploy/Darwin-base-opt-clasp.tar.gz
19:01:48
Shinmera
Clasp is going to be a bit more complicated to implement since MPS needs to be supported
19:15:41
drmeister
stassats: We are a clasp shop here - if you can't do it in clasp - it's not worth doing :-)
19:16:32
drmeister
I'm probably going to have to do the static-vector myself so that it works properly in boehm and mps.
19:18:14
drmeister
Yeah - we can't use the ECL code - because it relies on the Boehm GC not moving things around in memory.
19:19:14
drmeister
Well - not so differently. Clasp's arrays are implemented the way that sbcl implements them.
19:23:53
Shinmera
Pretty much. https://github.com/sionescu/static-vectors/blob/master/src/pkgdcl.lisp#L12-L26
19:26:53
Shinmera
The answer is as I said -- anything for which upgraded-array-element-type does not return T.
19:30:03
drmeister
It makes a huge difference in how the implementation manages it - that's why I'm asking.
19:30:06
Bike
So you have to make a static vector with type single-float and hope that's the cffi :float type? i mean, it probably is
19:30:15
Shinmera
The point is just that if the element-type is T, then you can't really portably know what's in there, so there's no point trying to share it with C
19:31:52
frgo
Looking at impl-allegro.lisp we should be able to more or less translate literally to corresponding calls as in fli.lisp
19:32:07
drmeister
Sometimes these lisp definitions are hard for me to parse into terms I understand as an implementor. It's about getting from "anything for which upgraded-array-element-type does not return T" to "you can use malloc to allocate the memory".
19:32:59
Shinmera
drmeister: I don't see how malloc comes into it? You create a lisp array like normal, provide a way to get a pointer to its data part, and then pin it in the GC.
19:34:16
drmeister
"then pin it in the GC" is easy with Boehm - a lot of work with MPS. So I can't put Common Lisp pointers into the vector then I'd rather use an allocator that doesn't move memory.
19:35:10
Shinmera
Should be easy with MPS too since you can just forbid arrays with pointers (since they'd be element-type T anyway)
19:36:51
kpoeck
Woudnât it be simpler to say its an array with a chunk of bytes that are not moved from the lisp gc
19:37:14
Shinmera
Has nothing to do with it being a simple-vector, but the library does not promise GC of static-arrays.
19:37:19
Bike
kpoeck: well the elements are still supposed to be accessible from lisp, if i understand correctly
19:39:28
Shinmera
static-vectors allows me to, for instance, fill an array with vertex data from lisp, then just pass that on to OpenGL to upload to the GPU without having to first copy it to C memory.
19:45:18
Shinmera
while we're on the talk of compatibility libraries, aside from dissect I have another for additional float features. Does clasp offer constants for the various float infinities, testing for float-nan, and masking the float traps/signals?
19:47:32
kpoeck
Drmeister: the franz doc for static arrays in gc.htm seems to be written more from a implementers perspective, perhaps worth a read
19:56:15
Shinmera
Here's an issue to track it more easily. https://github.com/clasp-developers/clasp/issues/583