freenode/#sicl - IRC Chatlog

13:47:17 Bike I think Clasp compilation is probably slow because we're giving LLVM code that it can't deal with quickly. drmeister did a comparison of ECL and Clasp because both can go through LLVM. What jumped out at me from the disassemblies was that while Clasp doesn't have that much more actual CODE, like operations, it has a lot more allocas - variables that are in memory, and not SSA form.

13:47:55 drmeister Hello everyone

13:48:15 Bike LLVM can and does reduce these to SSA, but that takes time. If i remember correctly, LLVM's value numbering and related phases took longer in Clasp than the entire compilation took in ECL. And, even if we did that work in Cleavir instead, it probably wouldn't go much faster. So I thought I'd try eliminating temporaries where they're created, in AST-to-HIR

13:48:19 drmeister I'm working on my talk - it's in about 8 hours.

13:48:43 Bike so over the last few days i've been trying an AST-to-HIR rewrite that generates fewer temporaries. I think I came up with something reasonably clean, eeeeexcept I underestimated how many temporaries are actually required by Lisp semantics

13:49:19 Bike I think it generates a few less temporaries, but nothing worth writing home about

13:50:26 Bike But it does have the interesting property that temporaries are always generated in static-few-assignment form, and could be made SSA without much more work

13:50:58 Bike So I'm wondering if it might be worthwhile for Cleavir to distinguish source variables, which can be closed over, SETQd, etc, from temporaries, which are used much more simply

13:51:22 Bike Kind of also ties into the thing I mentioned with segregate-lexicals, where we look through every temporary to see if it's closed over even though it never is

13:52:11 Bike We'd still need value numbering etc for the "actual" variables, but it would require a lot less work

13:52:32 Bike I think in most code, actual variables are less than a tenth of all of them. Probably even less most of the time

13:53:22 Bike (even in existing AST-to-HIR, temporaries are very nearly in SFA)

13:53:23 beach Interesting.

13:56:35 shka_ ACTION wonders if Bike knows that he is not on #clasp

13:57:40 Bike of course i do

13:58:29 Bike I mean, even outside of clasp, the whole thing means regalloc or whatever phase doing more work

13:58:46 beach I didn't realize that ECL could use LLVM.

13:59:38 beach Where do those alloca's come from?

14:00:09 beach I mean, isn't LLVM supposed to provide a "variable" abstraction that it then decided whether to allocate in memory or in a register?

14:00:34 beach Just like the HIR lexical locations.

14:00:39 Bike Yeah, that's what alloca is basically.

14:00:52 Bike It's for C of course, and alloca matches C automatic variables, you know, stack allocated.

14:01:17 beach Oh, OK, so how come ECL don't have those?

14:02:07 beach ECL must use lexical variables as well.

14:02:36 Bike Yeah. It has allocas for lexical variables, but I think the temporaries are in the LLVM "registers" or "values" or whatever, the SSA variables.

14:03:02 beach Now I'm lost (as usual).

14:03:16 Bike OK, so the way LLVM representation works is that the IR variables are always in SSA form

14:03:21 beach So LLVM has two abstractions and it is up to client code to decide whether they go on the stack or in register?

14:03:38 Bike No, LLVM can turn one into the other

14:03:38 beach OK.

14:04:17 Bike If you want variables that aren't in SSA, you can have them put in memory locations instead, and alloca allocates that memory

14:04:31 Bike but optimization phases can turn those into SSA variables, or spill SSA "registers" into memory

14:04:38 beach I see.

14:04:38 Bike phases within llvm, i mean.

14:04:47 beach Except that takes time?

14:05:20 Bike Yeah. It does value numbering and stuff to see what it can move into SSA

14:05:39 beach I think I understand.

14:06:02 beach And how does ECL get around this?

14:06:05 drmeister But for us llvm does value numbering too late. We need value numbering before type inference.

14:06:32 Bike different concern drmeister

14:06:55 beach I guess ECL just "knows" that many variables are SSA already, perhaps?

14:07:01 Bike I think so, yes.

14:07:15 Bike what ECL is actually doing is generating C code

14:07:21 beach Sure.

14:07:22 drmeister Bike: Granted - I just wanted to put that on the table.

14:07:33 beach We could do that too, but then it would take time in Cleavir instead.

14:07:42 beach I think that's what you were saying, right?

14:08:07 Bike What I meant was if we just move value numbering up, we would just be doing that work in Cleavir instead, so it wouldn't reduce the amount of work, yes.

14:08:29 beach Got it.

14:08:41 Bike But temporaries are pretty obviously SSA or close to SSA to begin with.

14:08:50 beach Not that I see how LLVM uses value number for that particular task, though.

14:08:57 beach Sure.

14:09:23 beach I guess it uses it for redundancy elimination.

14:09:23 Bike I don't know what LLVM is doing in particular. The internal optimization phases aren't always documented very well.

14:10:07 Bike http://blog.llvm.org/2009/12/introduction-to-load-elimination-in-gvn.html there's a bit of information here, where it does look like redundancy yes.

14:12:16 beach I see, yes.

14:13:12 beach Wait, though. You compared code size by disassemble.

14:13:23 beach But that's AFTER LLVM has done a lot of work.

14:13:40 Bike By disassembly I meant the LLVM IR before any optimization

14:13:47 beach Ah, OK.

14:13:49 Bike (on LLVM's part)

14:13:54 beach Got it.

14:15:36 beach So do you have any performance comparison between your rewrite and what we are currently doing?

14:17:51 Bike No. It still has a few bugs, and once I realized I couldn't eliminate most temporaries I started working on something else instead.

14:18:04 beach I see.

14:18:10 Bike The rewrite doesn't do anything I"m talking about here with another kind of lexical location or whatever, it's strictly an AST-to-HIR logic change.

14:18:23 beach OK.

14:19:35 drmeister So is part of the idea then to distinguish temporary variables from source variables? I did that about two weeks ago. I subclassed the lexical-location class with a temporary-lexical-location class and had the make-temp function create instances of that.

14:19:48 drmeister Then I changed the graphviz functions to render them a different color.

14:19:58 drmeister We have a lot of temporary-lexical-locations

14:21:07 drmeister It sounds like an exciting idea - to treat them differently from source variables in value numbering. Am I following your suggestion Bike?

14:21:43 Bike yeah, basically.

14:22:05 Bike Cleavir used to have a separate class for "simple" lexical locations that weren't closed over, but it was only used within segregate-lexicals so i eliminated it.

14:27:19 beach I understand that you need to do whatever it takes to improve Clasp compilation speed.

14:27:32 beach But treating temporaries differently is not such a great idea from the point of view of compiler design.

14:27:54 beach One of the main principles of compiler design is that you reduce things to as few cases as possible.

14:28:12 beach Then, you have one general mechanism to handle those cases.

14:28:35 beach So in this case, you will distinguish temporaries AS THEY ARE INTRODUCED from other variables.

14:29:05 beach But then, you will miss the cases where a lexical variable happens to behave like a temporary from the point of view of control and data flow.

14:29:33 beach So either you don't handle such variables efficiently, in which case you miss out on some possible optimizations.

14:29:45 beach Or else you do, and now you have two mechanisms for the same thing.

14:30:02 beach Again, I fully understand that you need to do whatever it takes, so you should.

14:30:12 beach I am just pointing it out.

14:31:15 beach I mean, Cleavir is meant to be adapted by client code, and you definitely should in order to get the behavior you want.

14:31:45 Bike Well what I was thinking is that temporaries would be marked as SFA or SSA, and then a later SFA or SSA phase could mark "actual" variables as such later as well.

14:32:21 beach Sure, that would be a minor addition. No problem with that.

14:33:32 beach Would that require your AST-to-HIR rewrite?

14:33:50 beach ... or did you basically abandon the idea of a rewrite?

14:35:03 Bike I stopped working on it because I realized I wouldn't be able to fulfill my original goal, but I thought of all this stuff this morning

14:35:11 Bike for which it would be useful.

14:35:23 beach OK.

14:35:42 Bike I don't know if this marking would require the rewrite per se, but the way it is it's just more explicit about certain things

14:36:02 beach As drmeister pointed out, marking temporaries as such when they are created is very easy.

14:36:25 Bike yeah.

14:36:47 beach You could try just adding those lines of code to what we have.

14:37:04 Bike yeah, i might.

14:37:17 beach ... rather than debugging your rewrite, I mean.

14:37:21 Bike right.

14:37:50 Bike there are basically two places you need something like a phi node- for if, and for block. as far as i can tell. this was really obvious in my rewrite, but i'll have to work something out in the existing code.

14:37:56 Bike i think it would be doable.

14:38:16 beach OK.

14:40:06 beach There is a paper that generates SSA directly from AST or some other similar representation, but I decided Cleavir couldn't use it because of the fact that it generates code from the end. And this is to get the evaluation context right.

14:40:57 beach Also, it probably assumes that the language is C-like, so no nested functions, etc.

14:41:23 Bike Yeah I mean we definitely can't generate everything in SSA right off, because we can't force everything to be SSA at all, because of closures. So that's off the table.

14:41:31 beach Exactly.

15:01:35 beach Suppose you wanted to create a Linux executable with a huge data structure already built inside it, like a Common Lisp implementation for instance. And suppose you want it to be able to handle ASLR. As I understand it, SBCL did that at some point. Wouldn't you basically have to mark every Common Lisp datum as relocatable, assign a symbolic name to it, and have the dynamic linker fix it up at startup time?

15:01:52 beach And if so, how much additional startup time would that take?

15:07:11 scymtym i don't think the platform's linker can help. in sbcl, relocating the heap is basically a full gc, if i remember correctly

15:07:23 beach Oh?

15:07:41 beach Why can't the linker be used?

15:08:43 scymtym i thought it worked by having relocation meta-data in some ELF section or similar

15:08:58 beach Sounds right.

15:09:54 scymtym so i'm generating that would be more complicated than relocating yourself. at least if you already have a moving gc

15:10:02 scymtym *i'm thinking

15:10:16 beach Oh, sure. That's possible.

15:10:25 beach But it could be done that way, right?

15:11:15 beach It's a dumb idea, but then, ASLR is a dumb idea. Or rather, a kludge to a hack.

15:13:11 scymtym i don't know about the practicality, but agree with the ASLR skepticism

15:15:31 beach Here is another question, also more philosophical than practical: It used to be the case that a global variable in C was just an address. Now, because of dynamic linking, they use a mechanism that introduces many more memory accesses (GOT tables and whatnot). We don't have that problem in Common Lisp, so could a Linux executable use a mechanism closer to that used for Common Lisp, rather than this GOT business?

15:18:07 beach Not terribly interesting, I know.

15:18:37 beach But I started thinking about it, because flip214 gave me feedback on the CLOSOS specification.

15:19:29 scymtym part of the problem seems to be that C doesn't have a "runtime" that could make adjustments when new code is loaded. and the linker doesn't know enough about the program to do it

15:19:32 scymtym but i'm on thin ice here

15:19:42 beach Sure, me too.

15:20:05 beach That's a clue though. But doesn't C++ require a way more complicated linker these days?

15:20:39 beach And in the CLOSOS specification, I discuss problems with existing operating systems in there, so ASLR came up. And that got me thinking about things that have become more complex because of shared libraries, etc.

15:22:30 beach I guess the C++ linker is still not active at run time, and that could be a requirement for a more sane mechanism.

15:25:28 scymtym in #sbcl, pfdietz just mentioned a paper in which C++ code is significantly sped up by optimizing at runtime :)

15:25:41 pfdietz https://arxiv.org/pdf/1807.06735.pdf

15:25:52 Bike c++ causes exciting things like two modules having code for the same macroexpanded (template) function definition

15:26:12 Bike i asked about it on twitter once and the compiler people i know started talking about drinking

15:27:34 beach Wow.

15:29:03 pfdietz Post-link time, at least.

15:29:20 beach pfdietz: Interesting.

15:30:14 pfdietz The big win is moving cold code away from hot code, and grouping the hot code together, to reduce pressure on the instruction cache.

15:31:09 beach I am a bit surprised that it would have any significant effect.

15:33:03 beach But maybe I don't know how a modern instruction cache works.

15:33:44 pfdietz Latency out to memory >> CPU cycle time. So cache misses are very costly.

15:33:56 beach That part I know.

15:34:17 beach I would think the code would be so close that it would fit in a cache line to have any effect.

15:34:59 beach I could see how it would have an effect on paging, but not on cache performance.

15:35:20 beach Unless, of course, the instruction cache works in 4k chunks these days.

15:37:53 beach It used to be the case that a cache line was a few words long, and that the cache line was chosen for some memory chunk based on the address modulo the size of the cache.