libera/#sicl - IRC Chatlog
Search
11:44:52
hayley
beach`: Do you have a favourite algorithm for compacting an entire heap, just prior to saving an image?
11:46:41
hayley
The compaction quality is more important than the time it would take to compact. I recall your paper on sliding GC mentioned a "Compressor"?
11:51:41
hayley
(Reading the paper and I think having to respect the "page" structure of the SBCL heap, and that moving objects can resize them, when it is necessary to record the hash value, could complicate things. But those issues probably exist for any algorithm I could find.)
12:02:59
hayley
For context, I got my non-moving collector to not crash with my test programs yesterday. But I still have not made it parallel, nor have I addressed how the card maps for generational collection are too imprecise, and the collector ends up having to trace objects in the old generation for no reason.
13:00:51
beach
I know how I would do it for the global SICL heap, but that depends on the fact that this heap is a Doug Leah allocator, so I wold just move all the objects to the beginning, while respecting the data structure of the allocator.
13:01:04
hayley
Stepping back, I don't have the property that, for some objects A and B which have no objects in between, and A has a lower address than B, I do not have new-location(B) = new-location(A) + size(A), which the Compressor relies on.
13:04:26
hayley
I can figure how I might compute the new locations of objects, but I also need to be able to fix pointers, which requires some sort of auxiliary table. One option I came up with, which is simple but likely inefficient, would be to use a hash table which maps old addresses to new.
13:07:24
hayley
I said compaction quality was more important, but I would like decent performance. Moving from zlib to zstd made saving a compressed image much faster, and I wouldn't want to undo that.
13:08:06
hayley
(That said, compaction is less important when using compressed images, but some people do want to use uncompressed images, too.)
13:25:21
hayley
Oh, and I'm concerned about how large the table might be. At worst it could be about as large as the heap, I think.
15:12:07
fittestbits
OK, my question is: how much use can the code generated by cleavir make of scratch registers?
15:13:16
fittestbits
I was working on a ABI where some registers were allocated as scratch and some were allocated as local vars.
15:14:51
fittestbits
The problem I ran into with the callee preserved registers is it would be (very) difficult to restore them when a throw unwinds the stack.
15:16:28
fittestbits
The scenario is: a function establishes a catch, has some locals in the "preserved" registers, calls one or more other functions deep which some of which save various subsets of the
15:16:48
beach
fittestbits: There are currently two versions of Cleavir, none of which does register allocation I think. Cleavir version 1 (in the SICL repository) leaves it up to SICL-specific code to do register allocation, and Cleavir version 2 I guess leaves it up to LLVM.
15:17:15
fittestbits
preserved registers, then there's a throw to the catch. Restoring the state of the preserved registers is then very hard.
15:17:27
beach
fittestbits: I designed a technique for unwinding the stack in the presence of callee-saved registers, but finally decided not to use those.
15:18:19
Bike
for callee registers you'd probably need to save information about them in the dynamic environment, and then have the unwinder deal with that. but that sounds like a lot of work.
15:19:15
beach
Yes, the technique I designed had a bitmap (or two?) related to every value of the program counter.
15:19:59
Bike
yeah, the pdf still lsits callee saved registers for the x86-64 backend, i just checked
15:20:58
fittestbits
So I'm thinking of not including callee preserves some of the registers. So, then the question becomes - how many scratch/temporary are really useful.
15:21:19
fittestbits
Fewer registers means fewer bits in the instructions dedicated to register numbers.
15:21:43
beach
I don't know what you mean by scratch/temporary. The register allocator should probably attempt to use all registers that are available.
15:23:08
fittestbits
At them moment I'm just emulating them - would like to eventually try to get into a FPGA.
15:24:03
beach
If you want performance, then a large number of registers is good. It looks to me like the RISC-V instruction layout is a good one.
15:25:13
Bike
cleavir does the usual thing of pretending there are infinite registers while doing optimizations, and then leaving it up to the backend to sort things out.
15:26:02
fittestbits
Hmm, from the few code segments I've been working on, function calls show up pretty quickly, which means the registers would be overwritten.
15:28:22
fittestbits
Exactly. Think of a high level instruction set - CAR, CDR are single instructions. Arithmetic instructions on fixnums are single instructions, with no SW checking of types.
15:29:07
fittestbits
And more complex arithmetic operations trapping into microcode, so again no software (SW) checking of types.
15:30:02
fittestbits
Yes, but, so far from what I've seen, I'm not seeing very many instructions between function calls.
15:32:56
Bike
i'd expect even less function calls than i'm used to seeing, even, if you don't need to call something to do type checks or arithmetic
15:35:38
fittestbits
I'll grab some code from SICL and compile it to show you what I'm seeing w.r.t. to function calls. Again, I want to say I've been looking at a pretty limited set of code examples.
15:35:50
beach
fittestbits: Let's put it this way. The number of registers of x86-64 is barely enough, and only if you don't segregate them the way SBCL does. I.e., you can store anything in any register.
15:36:32
beach
fittestbits: Well, if you don't do any inlining, sure, there are going to be lots of them.
15:36:58
Bike
i wonder if you could do something fancy with the call site optimization - save only registers that are both live and actually used by the callee
15:39:03
beach
fittestbits: Inlining is not just to avoid a function call. It also exposes callee code to optimization in the caller, and makes it easier to keep things in registers.
15:40:15
fittestbits
Yes, I can see that ... What are you doing about changes to the callee? Keeping track of who all called it and recompiling them?
15:41:19
Bike
like arithmetic and array access and stuff, which are all functions, but which you probably want to avoid relegating to function calls in all cases
15:42:24
fittestbits
So, maybe much of what you're thinking of inlining, I'm thinking of as microcode.
15:44:13
beach
For example, array access in a loop can do things like strength reduction, and elide boundary checks if they can be done by loop-invariant code.
15:53:14
fittestbits
My interest is more towards security than raw performance. And it seems to me allowing pointer arithmetic at the compiled code level opens the door for all kinds of issues.
15:53:53
beach
In that case, don't use a register allocator at all, and allocate everything on the stack.
15:54:28
beach
You will have plenty of function calls as you pointed out, so there will be no reason to keep things in registers.
15:56:39
fittestbits
The development does seem to be trending in that direction ... then the implementation would be to keep the top portion of the stack in "registers".
15:58:25
beach
Well, if you have plenty of function calls, there might not be any point. You could then just use scratch registers for when an instruction requires an operand to be in a register, but you would always spill it to the stack after a mutation.
16:03:50
fittestbits
This has all been very helpful. Again, thanks. I'll keep on trying various architectures to see where they go.
16:09:22
fittestbits
I'm interested in using cleavir. Not sure how to get started, and I guess I'm not really ready for that anyway.
16:11:40
Bike
a few months ago i whipped up an "example" system showing how to generate the IR, and then from there you have to translate that into whatever backend code
16:12:14
Bike
lately i've been thinking of adding a backend in the form of the virtual machine we've been working on for clasp - excised of the few nonportable parts of course - which could then round out the example
16:19:01
fittestbits
An example would be good. A short doc about the steps you went through might be good as well.
16:27:51
Bike
i have been pretty tied up with this vm stuff, but whenever we're finally done with that i'd like to finish the compiler reorganization i wrote about before, and then maybe shore up the documentation generally, since that'll leave me with a good overall idea of the organization of the system