freenode/#sicl - IRC Chatlog
Search
13:47:17
Bike
I think Clasp compilation is probably slow because we're giving LLVM code that it can't deal with quickly. drmeister did a comparison of ECL and Clasp because both can go through LLVM. What jumped out at me from the disassemblies was that while Clasp doesn't have that much more actual CODE, like operations, it has a lot more allocas - variables that are in memory, and not SSA form.
13:48:15
Bike
LLVM can and does reduce these to SSA, but that takes time. If i remember correctly, LLVM's value numbering and related phases took longer in Clasp than the entire compilation took in ECL. And, even if we did that work in Cleavir instead, it probably wouldn't go much faster. So I thought I'd try eliminating temporaries where they're created, in AST-to-HIR
13:48:43
Bike
so over the last few days i've been trying an AST-to-HIR rewrite that generates fewer temporaries. I think I came up with something reasonably clean, eeeeexcept I underestimated how many temporaries are actually required by Lisp semantics
13:50:26
Bike
But it does have the interesting property that temporaries are always generated in static-few-assignment form, and could be made SSA without much more work
13:50:58
Bike
So I'm wondering if it might be worthwhile for Cleavir to distinguish source variables, which can be closed over, SETQd, etc, from temporaries, which are used much more simply
13:51:22
Bike
Kind of also ties into the thing I mentioned with segregate-lexicals, where we look through every temporary to see if it's closed over even though it never is
13:52:11
Bike
We'd still need value numbering etc for the "actual" variables, but it would require a lot less work
13:52:32
Bike
I think in most code, actual variables are less than a tenth of all of them. Probably even less most of the time
13:58:29
Bike
I mean, even outside of clasp, the whole thing means regalloc or whatever phase doing more work
14:00:09
beach
I mean, isn't LLVM supposed to provide a "variable" abstraction that it then decided whether to allocate in memory or in a register?
14:00:52
Bike
It's for C of course, and alloca matches C automatic variables, you know, stack allocated.
14:02:36
Bike
Yeah. It has allocas for lexical variables, but I think the temporaries are in the LLVM "registers" or "values" or whatever, the SSA variables.
14:03:16
Bike
OK, so the way LLVM representation works is that the IR variables are always in SSA form
14:03:21
beach
So LLVM has two abstractions and it is up to client code to decide whether they go on the stack or in register?
14:04:17
Bike
If you want variables that aren't in SSA, you can have them put in memory locations instead, and alloca allocates that memory
14:04:31
Bike
but optimization phases can turn those into SSA variables, or spill SSA "registers" into memory
14:06:05
drmeister
But for us llvm does value numbering too late. We need value numbering before type inference.
14:08:07
Bike
What I meant was if we just move value numbering up, we would just be doing that work in Cleavir instead, so it wouldn't reduce the amount of work, yes.
14:09:23
Bike
I don't know what LLVM is doing in particular. The internal optimization phases aren't always documented very well.
14:10:07
Bike
http://blog.llvm.org/2009/12/introduction-to-load-elimination-in-gvn.html there's a bit of information here, where it does look like redundancy yes.
14:15:36
beach
So do you have any performance comparison between your rewrite and what we are currently doing?
14:17:51
Bike
No. It still has a few bugs, and once I realized I couldn't eliminate most temporaries I started working on something else instead.
14:18:10
Bike
The rewrite doesn't do anything I"m talking about here with another kind of lexical location or whatever, it's strictly an AST-to-HIR logic change.
14:19:35
drmeister
So is part of the idea then to distinguish temporary variables from source variables? I did that about two weeks ago. I subclassed the lexical-location class with a temporary-lexical-location class and had the make-temp function create instances of that.
14:21:07
drmeister
It sounds like an exciting idea - to treat them differently from source variables in value numbering. Am I following your suggestion Bike?
14:22:05
Bike
Cleavir used to have a separate class for "simple" lexical locations that weren't closed over, but it was only used within segregate-lexicals so i eliminated it.
14:27:19
beach
I understand that you need to do whatever it takes to improve Clasp compilation speed.
14:27:32
beach
But treating temporaries differently is not such a great idea from the point of view of compiler design.
14:27:54
beach
One of the main principles of compiler design is that you reduce things to as few cases as possible.
14:28:35
beach
So in this case, you will distinguish temporaries AS THEY ARE INTRODUCED from other variables.
14:29:05
beach
But then, you will miss the cases where a lexical variable happens to behave like a temporary from the point of view of control and data flow.
14:29:33
beach
So either you don't handle such variables efficiently, in which case you miss out on some possible optimizations.
14:31:15
beach
I mean, Cleavir is meant to be adapted by client code, and you definitely should in order to get the behavior you want.
14:31:45
Bike
Well what I was thinking is that temporaries would be marked as SFA or SSA, and then a later SFA or SSA phase could mark "actual" variables as such later as well.
14:35:03
Bike
I stopped working on it because I realized I wouldn't be able to fulfill my original goal, but I thought of all this stuff this morning
14:35:42
Bike
I don't know if this marking would require the rewrite per se, but the way it is it's just more explicit about certain things
14:36:02
beach
As drmeister pointed out, marking temporaries as such when they are created is very easy.
14:37:50
Bike
there are basically two places you need something like a phi node- for if, and for block. as far as i can tell. this was really obvious in my rewrite, but i'll have to work something out in the existing code.
14:40:06
beach
There is a paper that generates SSA directly from AST or some other similar representation, but I decided Cleavir couldn't use it because of the fact that it generates code from the end. And this is to get the evaluation context right.
14:41:23
Bike
Yeah I mean we definitely can't generate everything in SSA right off, because we can't force everything to be SSA at all, because of closures. So that's off the table.
15:01:35
beach
Suppose you wanted to create a Linux executable with a huge data structure already built inside it, like a Common Lisp implementation for instance. And suppose you want it to be able to handle ASLR. As I understand it, SBCL did that at some point. Wouldn't you basically have to mark every Common Lisp datum as relocatable, assign a symbolic name to it, and have the dynamic linker fix it up at startup time?
15:07:11
scymtym
i don't think the platform's linker can help. in sbcl, relocating the heap is basically a full gc, if i remember correctly
15:09:54
scymtym
so i'm generating that would be more complicated than relocating yourself. at least if you already have a moving gc
15:15:31
beach
Here is another question, also more philosophical than practical: It used to be the case that a global variable in C was just an address. Now, because of dynamic linking, they use a mechanism that introduces many more memory accesses (GOT tables and whatnot). We don't have that problem in Common Lisp, so could a Linux executable use a mechanism closer to that used for Common Lisp, rather than this GOT business?
15:18:37
beach
But I started thinking about it, because flip214 gave me feedback on the CLOSOS specification.
15:19:29
scymtym
part of the problem seems to be that C doesn't have a "runtime" that could make adjustments when new code is loaded. and the linker doesn't know enough about the program to do it
15:20:05
beach
That's a clue though. But doesn't C++ require a way more complicated linker these days?
15:20:39
beach
And in the CLOSOS specification, I discuss problems with existing operating systems in there, so ASLR came up. And that got me thinking about things that have become more complex because of shared libraries, etc.
15:22:30
beach
I guess the C++ linker is still not active at run time, and that could be a requirement for a more sane mechanism.
15:25:28
scymtym
in #sbcl, pfdietz just mentioned a paper in which C++ code is significantly sped up by optimizing at runtime :)
15:25:52
Bike
c++ causes exciting things like two modules having code for the same macroexpanded (template) function definition
15:26:12
Bike
i asked about it on twitter once and the compiler people i know started talking about drinking
15:30:14
pfdietz
The big win is moving cold code away from hot code, and grouping the hot code together, to reduce pressure on the instruction cache.
15:34:17
beach
I would think the code would be so close that it would fit in a cache line to have any effect.
15:37:53
beach
It used to be the case that a cache line was a few words long, and that the cache line was chosen for some memory chunk based on the address modulo the size of the cache.