freenode/#clasp - IRC Chatlog

12:27:40 Bike well, first off inlining is certainly a bottleneck. there's no question. cst-to-ast results in major, major slowdown.

12:28:28 Bike but i believe the slowdown is pretty much entirely due to LET. LET functions have unique characteristics, like having deep nesting (causing the multiple copies)

12:29:28 Bike if we're inlining global functions, we can skip a lot of the analysis partial inlining does now- that's a major inefficiency- since we can just determine things ahead of time

12:30:24 Bike we only need the full analysis for the comparatively rarer cases of local and anonymous functions, and i think the code we have now is acceptably performant for that anyway

12:33:06 Bike i also don't think characterizing "pre-inlining" LET as special is fair. we'd be marking where variables are created, which we probably want to do for various reasons anyway, and then letting segregate-lexicals use that information.

12:33:23 Bike it's letting segregate-lexicals consider multiple kinds of instructions as creating bindings, instead of just ENTER.

12:49:33 Bike we can talk about this when you're around, of course

13:10:47 Bike I guess it's like, we COULD compile (foo ...) for function foo as a call to FUNCALL, and then rely on general mechanisms to reduce it, but why bother? is that really a "special case" or just basic semantics?

13:27:33 beach So here is what I think. Inlining has a performance problem. It might be that we are using a quadratic algorithm where a linear one is available.

13:27:57 beach If so, we need to make sure it is true and we need to fix it.

13:28:21 beach It is entirely possible that, when this problem is fixed, things will be fast enough.

13:28:55 beach If that is the case, working on what I still consider a special case for LET will be wasted work, and will leave us with more code to maintain.

13:29:31 beach In the meantime, inlining for other situations than LET still has a performance problem.

13:30:30 Bike Okay, I don't mind looking at that first. Though I don't think I understand what quadratic performance you mean. From what I could tell, nested lets resulted in a linear amount of copying.

13:31:35 Bike and i don't really see how to avoid that.

13:31:40 beach Well, quadratic is not quite true, M*N rather, where M is the nesting depth and N the number of instructions on the average at each level.

13:31:51 Bike Oh. That. Okay.

13:31:55 Bike How would that be avoided?

13:32:07 beach By inlining outermost-in rather than innermost-out.

13:32:42 beach Well, wait.

13:32:49 beach Not M*N.

13:33:09 beach N + 2N + 3N + ... + MN.

13:33:13 beach How much is that?

13:33:36 beach M^2N, right?

13:33:44 Bike yeah.

13:33:54 beach But outermost-in will give M*N.

13:34:38 Bike I'm not sure I understand. One thing is that, as far as I can tell, if we inline a function that ENCLOSEs other functions, those other functions have to be copied as well, so that they close over the correct variables.

13:34:39 beach It is an approximation of course, because not ever level has N instructions.

13:35:11 beach I would have to think about that.

13:36:20 Bike For example if we have (let ((x ...)) (let ((y ...)) (+ x y))), inlining outer first results in like (progn (setq xprime ...) (let ((y ...)) (+ xprime y)))

13:37:12 beach Yes, maybe so.

13:37:23 beach I am having a hard time seeing it at the source level.

13:37:31 beach So what do you conclude from that?

13:38:15 Bike That going outermost in doesn't reduce the amount of copying since we still have to copy the inner parts.

13:38:23 Bike Repeatedly.

13:38:30 beach I don't see that.

13:38:42 beach Let me work it out at the HIR level and get back to you.

13:38:47 Bike Sure.

13:39:59 Bike oh, and the other thing i thought of was that right now segregate-lexicals examines all variables for being shared, but in practice only the minority of variables actually corresponding to lisp bindings need to be checked. i don't know if fixing that would help performance much, but it's a thought

13:40:46 beach I see what you mean.

13:56:45 Bike lisp bindings and also catch-instruction output, really

14:07:10 beach http://metamodular.com/example.pdf

14:07:17 beach Suppose we have this situtation.

14:07:49 beach I can't see anything wrong with obtaining this one by inlining only the middle instructions: http://metamodular.com/example2.pdf

14:08:50 beach I took some shortcuts, but I think it illustrates my thinking.

14:10:15 beach I should probably also handle the case where the middle one creates a variable that is used by the rightmost one.

14:10:39 beach I mean, make sure it is handled right.

14:11:08 beach the call2/enter2 pair should be replaced by assignments.

14:11:43 beach I guess I should work on it some more so that it is more strict.

14:13:25 Bike the problem is the local variables. like, consider if the output of enclose2 was used by the enter3 function. that output is copied into enter1, but enter3 doesn't use the copy.

14:19:29 beach Yes, I see it.

14:20:00 beach Thanks.

14:21:48 Bike i only know this through unfortunate experience

14:21:58 beach I understand.

14:23:11 beach However, we must improve inlining anyway, so I think we must use a quadratic algorithm only when it is necessary.

14:23:53 beach I will go away and think about it.

14:24:27 beach In the meantime, I would really like these statistics about how many times each instruction gets inlined for real big examples.

14:24:36 Bike We could probably skip copying some inner functions if we tie the inlining in more with the determination of what variables are closed over (which we do anyway)

14:24:52 beach That's what I am thinking.

14:25:08 Bike But for LET kind of functions where an inner body is going to use a lot of bindings, it might not help

14:25:16 Bike anyway, yeah, i'll whip up the stats business

14:26:28 drmeister Hello everyone

14:27:40 beach Hello drmeister.

14:28:59 beach Bike: In this particular example, the function defined by enter2 should disappear and the one defined by enter3 should use the variable introduced by the function defined by enter1 instead.

14:29:32 beach Let me work out the example...

14:29:56 Bike you mean, because enter3 is only used once?

14:31:04 Bike used only in enter2, and enter2 is only called once, that is

14:32:31 beach Yes.

14:32:43 beach And enter2 is unused after the inlining.

14:33:31 beach New versions at the same links: http://metamodular.com/example.pdf http://metamodular.com/example2.pdf

14:33:55 beach I think the LET case falls exactly into that category.

14:34:35 beach So we should detect that case (which will then work for situations other than the ones that happen by LET) and handle it.

14:35:05 beach Doing it that way will likely solve the LET problem AND make inlining faster in general.

14:35:26 Bike if a function is called only once we don't even need to copy it to inline

14:35:38 Bike i think.

14:35:56 beach Maybe.

14:36:32 beach I am not convinced yet. I would have to think harder about it.

14:36:35 Bike i mean, and that would probably be easier to write than modifying any enclosed functions recursively

14:37:03 beach I totally understand. But we need to be convinced that it's correct to do so.

14:37:35 beach For instance, the inlined function may have several RETURN instructions.

14:38:11 Bike at the moment that's not possible.

14:38:20 beach But it could be.

14:39:00 beach Wait, why is it not possible?

14:39:10 beach What if there are several RETURN-FROMs in the source?

14:39:13 Bike ast-to-hir only generates one return per function

14:39:19 beach I see.

14:39:25 beach Well that might change in the future.

14:39:52 Bike anyway, would it be a problem? we'd just have any return in the function being inlined turn into a local control transfer to the former return site

14:40:25 beach I am not totally sure, but it might work.

14:40:56 beach Either way, I think the big gain is going to be from avoiding the M^2 behavior in most cases.

14:43:40 beach Anyway, do you see my point that, if we treat this general case, the LET situation will likely take care of itself, and we need to treat this general case anyway at some point. Whereas if we handle the LET case specially, we will have more code, because we still need to handle the general case (if it is produced by something other than LET).

14:44:51 beach So my suggestion is: 1. Make sure quadratic behavior is a problem.

14:45:04 beach 2. Try to eliminate it where it is correct to do so.

14:45:14 beach ... but still copy the instructions.

14:45:31 beach 3. Measure again to see how much time is spent copying instructions.

14:46:04 beach If it is still a problem consider whether copying can be avoided in some cases.

14:48:09 beach Oh, and there is another interesting situation. If enter2 does not create a variable that is used by any of its descendants, then it doesn't matter whether it is called several times. It can still be inlined.

14:48:20 beach And in that situation, you do need to copy the instructions.

14:49:08 beach So that's another reason to avoid two ways of doing it, one way by copying instructions and another way by not copying them. Two ways would give more code and more maintenance.

14:50:36 beach Bike: Am I making sense here?

14:50:53 Bike Sure.

14:51:56 beach So do we have a plan?

14:52:18 beach If so, I look forward to hearing intermediate goals from you.

14:52:26 Bike Yeah I'll do the stats first.

14:52:32 beach Great!

14:57:13 beach You will need two hash tables.

14:57:36 beach One mapping initial instructions to a counter of how many times it was inlined.

14:57:51 beach Another to map copies of instructions to initial instructions.

15:00:28 Bike Something like that. Not all instructions are copied in an obvious way so I'll have to work something out.

15:01:07 beach OK.

15:01:55 Bike oh, and i did another really basic count, and found that compiling a 100-line function with various LABELS, LET*, DOLIST and such involved over fifty five thousand calls to clone-instruction

15:02:20 beach That sounds like a clue.

15:13:58 drmeister bike: When you first got the inlining to work - it was really, really slow. We ran it with dtrace profiling and it showed a problem that you were almost immediately able to fix. Do you recall what that was?

15:22:52 drmeister Bike: How many HIR nodes does that 100-line function start with before inlining?

15:25:00 drmeister Counting the number of times each progenitor HIR instruction gets cloned (with the two hash tables) will show the pattern of how inlining is applied (inside-out vs outside-in) but we can still learn a lot from the total number of clones made/number of progenitor instructions - right?

15:26:09 drmeister Because the best approach would have "total # clones"/"# progenitor" somewhere near 1.0 - right?

15:29:03 drmeister I woke up this morning thinking it would be neat to generate a HIR graph of the progenitor instructions where next to each progenitor instruction with the number of times each progenitor was cloned in the final result was displayed.

15:30:20 drmeister beach: I was further thinking that your code for rendering HIR graphs might benefit from accepting a hash-table of instructions mapped to additional info to add to each instruction label. Then we could annotate the HIR graphs with all sorts of useful information.

15:31:08 beach Sounds good.

15:31:30 beach I can already run the inspector on any instruction.

15:32:14 beach Uh oh. House guests and (admittedly small) family are back from tourism. I need to go. I might be back soon.

15:32:17 Bike for kildall i did something like that but with (input or output, instruction) pairs

15:32:18 Bike see you

15:33:19 drmeister What was the problem with getting beach's HIR graph viewer working? McCLIM?

15:34:13 drmeister I'm really interested in getting it working - or getting something like it working in the jupyterlab interface.

15:34:21 drmeister It would make this stuff a lot more concrete for me.

15:34:45 Bike i don't know if there was a problem, i never tried it

15:35:25 drmeister I thought frgo or kpoeck ran a test and pointed out the problem. To the bat-log!

15:36:23 Bike "No applicable method for CONCRETE-SYNTAX-TREE:CONSP with arguments of types STANDARD-GENERIC-FUNCTION." that's a new one

15:39:45 drmeister http://irclog.tymoon.eu/freenode/%23clasp?around=1532631694#1532631694

15:39:55 drmeister There is some issue with trivial-garbage

15:40:15 Bike upstream trivial-garbage doesn't support clasp

15:40:30 drmeister Ah - that's easy enough to fix.

15:41:39 frgo Actually the problems seems to be with a package called static-vectors

15:42:37 drmeister frgo: Ah - is that related to trivial-garbage? Or is it a separate issue.

15:42:45 frgo separate issue

15:42:50 frgo unfortuantely

15:43:25 drmeister Meh - I'm interested in fixing it.

15:44:06 frgo I can't help ATM - I don't have a clasp image ready.

16:03:50 shiho drmeister: I got the error "Condition of type: FILE-ERROR Filesystem error with pathname "quicklisp:setup.lisp"." Do I need to pull new quicklisp?

16:11:47 drmeister shiho: This is on the new laptop?

16:12:43 shiho no, old one

16:13:13 shiho I just want to check my change works correctly.

16:20:05 drmeister Ah - I made some changes to where cando looks for the quicklisp directory - your old machine doesn't have /opt/clasp -- either you should move completely to the laptop or we have to set up the iMac to work with the new development environment

16:23:12 Bike should be pretty easy to set up the logical pathname host to point at ~/quicklisp, right?

16:25:03 drmeister shiho: Can you evaluate (probe-file "quicklisp:") on the iMac and tell me what it says?

16:26:40 shiho In old machine?

16:30:14 shiho I'm moving to the new machine, so that's ok.

16:30:30 drmeister In the old machine

16:31:48 shiho I can't run jupyter lab in the old machine...

16:32:00 drmeister Ok - next I'll tell you that the development on the new machine is broken. Yay! Welcome to my shitty Python world.

16:32:08 drmeister I'm working on it as we speak.

17:09:32 drmeister https://s3.us-east-2.amazonaws.com/clasp-cando/deploy/Darwin-base-opt-clasp.tar.gz

17:09:39 drmeister Don't click that just yet

18:57:30 kpoeck Mcclim did stop loading when a system like staticArrays was required.

18:58:07 Shinmera do you mean static-vectors?

18:58:08 stassats do you have to run mcclim on clasp?

18:58:21 kpoeck There is a definition for ecl, perhaps you could help in the definition

18:59:14 kpoeck https://github.com/sionescu/static-vectors

19:00:41 kpoeck https://github.com/sionescu/static-vectors/blob/master/src/impl-ecl.lisp

19:01:48 Shinmera Clasp is going to be a bit more complicated to implement since MPS needs to be supported

19:02:07 Bike yeah, ecl's definition just relies on its non moving gc

19:03:59 kpoeck Being pragmatic, could we start #+boehm?

19:05:24 Bike yeah, with boehm it would just be like ecl

19:05:44 Shinmera except for static-vector-pointer

19:05:55 Bike there's already a use-boehm feature

19:14:56 kpoeck Static-vector-address from ecl would be kosher?

19:15:06 Shinmera definitely not

19:15:41 drmeister stassats: We are a clasp shop here - if you can't do it in clasp - it's not worth doing :-)

19:16:13 Shinmera Finally I know that it's not worth living!

19:16:32 drmeister I'm probably going to have to do the static-vector myself so that it works properly in boehm and mps.

19:16:48 drmeister Sadly - I'm the only one who knows where all the bodies are buried.

19:18:14 drmeister Yeah - we can't use the ECL code - because it relies on the Boehm GC not moving things around in memory.

19:18:25 Shinmera and on the ECL way of doing arrays

19:18:38 drmeister What is special about that?

19:18:48 Shinmera your arrays are implemented differently, right?

19:19:03 Shinmera so you need to change the way the pointer to the array data is retrieved.

19:19:14 drmeister Well - not so differently. Clasp's arrays are implemented the way that sbcl implements them.

19:19:28 drmeister stassats taught me well.

19:20:25 drmeister Does static-vector handle fill pointers and adjustable static-vectors?

19:20:35 Shinmera no, the vectors need to be simple.

19:20:35 drmeister Or are they like simple-vectors that just don't move.

19:20:56 drmeister So it's just about simple vectors that don't move.

19:21:03 Shinmera I mean you're allowed to support more of course

19:21:37 Shinmera but portably it's only 1D simple-arrays.

19:22:34 drmeister Where is the API for static-vectors defined?

19:22:43 drmeister Is it just in the source code?

19:23:53 Shinmera Pretty much. https://github.com/sionescu/static-vectors/blob/master/src/pkgdcl.lisp#L12-L26

19:23:55 drmeister What are these static vectors able to contain?

19:24:51 drmeister The README says: "not arrays of element-type T"

19:24:52 Shinmera Things that aren't upgraded to T.

19:25:15 drmeister So no Common Lisp objects?

19:25:41 drmeister Just raw bytes or doubles or floats - plain-old-data?

19:25:48 Shinmera everything is an object

19:26:06 drmeister I'm not following.

19:26:15 Shinmera In Common Lisp any value is an object.

19:26:22 Shinmera floats are objects.

19:26:31 drmeister A C float

19:26:53 Shinmera The answer is as I said -- anything for which upgraded-array-element-type does not return T.

19:27:03 drmeister I'll go and ask cracauer or bike to explain it to me.

19:27:34 Bike oh. specialized arrays i guess

19:28:40 Bike are you supposed to be able to access the contents as a C array? looks like it.

19:28:55 kpoeck If you look at with-Static-Vector it has element type (unsigned-byte 8)

19:29:12 Shinmera Bike: yes.

19:29:15 drmeister So no pointers to Common Lisp objects is what I'm thinking.

19:29:28 Bike yeah.

19:29:32 kpoeck Imho yes

19:29:34 Shinmera It doesn't really matter, static-vectors doesn't enforce anything

19:29:38 drmeister Alright - that's easy.

19:30:03 drmeister It makes a huge difference in how the implementation manages it - that's why I'm asking.

19:30:06 Bike So you have to make a static vector with type single-float and hope that's the cffi :float type? i mean, it probably is

19:30:15 Shinmera The point is just that if the element-type is T, then you can't really portably know what's in there, so there's no point trying to share it with C

19:30:16 drmeister It sounds like I can allocate space with malloc.

19:30:49 kpoeck Look at the other definitions with memset and memcopy

19:31:22 Shinmera Bike: Pretty much yes.

19:31:43 Bike Huh

19:31:52 frgo Looking at impl-allegro.lisp we should be able to more or less translate literally to corresponding calls as in fli.lisp

19:32:07 drmeister Sometimes these lisp definitions are hard for me to parse into terms I understand as an implementor. It's about getting from "anything for which upgraded-array-element-type does not return T" to "you can use malloc to allocate the memory".

19:32:59 Shinmera drmeister: I don't see how malloc comes into it? You create a lisp array like normal, provide a way to get a pointer to its data part, and then pin it in the GC.

19:33:05 Bike the way i'd put it is that an array with type T holds boxed lisp objects

19:33:12 Bike which C obviously can't really do anything with

19:33:21 Bike pretty much any other type is going to be some kind of primitive data

19:33:36 Bike and yeah, we don't have to use malloc

19:33:52 Bike in mps it would be the... whatever non moving pool

19:33:56 Bike i think we already have one, even

19:34:07 Shinmera It's really not a complicated library.

19:34:16 drmeister "then pin it in the GC" is easy with Boehm - a lot of work with MPS. So I can't put Common Lisp pointers into the vector then I'd rather use an allocator that doesn't move memory.

19:35:10 Shinmera Should be easy with MPS too since you can just forbid arrays with pointers (since they'd be element-type T anyway)

19:35:51 Bike Oh, that's true, they're in amcz

19:35:55 Bike i think mps moves those though.

19:36:42 drmeister The simple-vector forces the programmer to manage the life-time - correct?

19:36:51 kpoeck Woudn‘t it be simpler to say its an array with a chunk of bytes that are not moved from the lisp gc

19:36:55 drmeister AMCZ moves its contents

19:37:14 Shinmera Has nothing to do with it being a simple-vector, but the library does not promise GC of static-arrays.

19:37:19 Bike kpoeck: well the elements are still supposed to be accessible from lisp, if i understand correctly

19:37:27 Shinmera Bike understands right

19:37:28 Bike there's a free-static-vector there, so yeah

19:37:48 kpoeck See free-static-vectoras

19:39:19 kpoeck Access from lisp is via static-vector-pointer, right?

19:39:28 Shinmera static-vectors allows me to, for instance, fill an array with vertex data from lisp, then just pass that on to OpenGL to upload to the GPU without having to first copy it to C memory.

19:39:41 Shinmera kpoeck: That's only if you want the pointer. It's a normal array otherwise.

19:42:34 kpoeck Ok, understood

19:45:18 Shinmera while we're on the talk of compatibility libraries, aside from dissect I have another for additional float features. Does clasp offer constants for the various float infinities, testing for float-nan, and masking the float traps/signals?

19:45:58 Shinmera If not, adding that stuff would be great.

19:47:32 kpoeck Drmeister: the franz doc for static arrays in gc.htm seems to be written more from a implementers perspective, perhaps worth a read

19:56:15 Shinmera Here's an issue to track it more easily. https://github.com/clasp-developers/clasp/issues/583

19:56:22 Shinmera I'll also make one for Dissect.

20:04:42 kpoeck https://franz.com/support/documentation/10.1/doc/gc.htm#static-arrays-2

20:06:13 Shinmera https://github.com/clasp-developers/clasp/issues/584

20:53:57 akkad ** NICK Ober