libera/#sicl - IRC Chatlog

11:44:52 hayley beach`: Do you have a favourite algorithm for compacting an entire heap, just prior to saving an image?

11:46:41 hayley The compaction quality is more important than the time it would take to compact. I recall your paper on sliding GC mentioned a "Compressor"?

11:51:41 hayley (Reading the paper and I think having to respect the "page" structure of the SBCL heap, and that moving objects can resize them, when it is necessary to record the hash value, could complicate things. But those issues probably exist for any algorithm I could find.)

12:02:59 hayley For context, I got my non-moving collector to not crash with my test programs yesterday. But I still have not made it parallel, nor have I addressed how the card maps for generational collection are too imprecise, and the collector ends up having to trace objects in the old generation for no reason.

12:56:49 beach hayley: I think such an algorithm would depend on how the heap is organized.

12:58:01 hayley Indeed. I'm just not sure how to approach the problem.

13:00:51 beach I know how I would do it for the global SICL heap, but that depends on the fact that this heap is a Doug Leah allocator, so I wold just move all the objects to the beginning, while respecting the data structure of the allocator.

13:01:04 hayley Stepping back, I don't have the property that, for some objects A and B which have no objects in between, and A has a lower address than B, I do not have new-location(B) = new-location(A) + size(A), which the Compressor relies on.

13:04:26 hayley I can figure how I might compute the new locations of objects, but I also need to be able to fix pointers, which requires some sort of auxiliary table. One option I came up with, which is simple but likely inefficient, would be to use a hash table which maps old addresses to new.

13:05:23 beach I thought you said performance is not a problem.

13:05:30 beach Or something to that effect.

13:07:24 hayley I said compaction quality was more important, but I would like decent performance. Moving from zlib to zstd made saving a compressed image much faster, and I wouldn't want to undo that.

13:08:06 hayley (That said, compaction is less important when using compressed images, but some people do want to use uncompressed images, too.)

13:25:21 hayley Oh, and I'm concerned about how large the table might be. At worst it could be about as large as the heap, I think.

15:09:07 fittestbits beach: is there a separate chat for cleavir?

15:10:37 Bike we've generally just talked about it here

15:12:07 fittestbits OK, my question is: how much use can the code generated by cleavir make of scratch registers?

15:13:16 fittestbits I was working on a ABI where some registers were allocated as scratch and some were allocated as local vars.

15:13:29 Bike scratch as in temporaries?

15:13:34 Bike like intermediate results

15:13:36 fittestbits The ones allocated as local vars would be required to be preserved by callees.

15:13:56 fittestbits Yes, temporaries for intermediate results.

15:14:51 fittestbits The problem I ran into with the callee preserved registers is it would be (very) difficult to restore them when a throw unwinds the stack.

15:16:28 fittestbits The scenario is: a function establishes a catch, has some locals in the "preserved" registers, calls one or more other functions deep which some of which save various subsets of the

15:16:48 beach fittestbits: There are currently two versions of Cleavir, none of which does register allocation I think. Cleavir version 1 (in the SICL repository) leaves it up to SICL-specific code to do register allocation, and Cleavir version 2 I guess leaves it up to LLVM.

15:17:15 fittestbits preserved registers, then there's a throw to the catch. Restoring the state of the preserved registers is then very hard.

15:17:27 beach fittestbits: I designed a technique for unwinding the stack in the presence of callee-saved registers, but finally decided not to use those.

15:17:32 Bike they're both a frontend and a middle end. cleavir 2 itself has no interface with llvm

15:18:19 Bike for callee registers you'd probably need to save information about them in the dynamic environment, and then have the unwinder deal with that. but that sounds like a lot of work.

15:19:15 beach Yes, the technique I designed had a bitmap (or two?) related to every value of the program counter.

15:19:44 beach That section of the SICL specification might still be in GIT.

15:19:59 Bike yeah, the pdf still lsits callee saved registers for the x86-64 backend, i just checked

15:20:02 Bike lists*

15:20:30 beach That's great (I guess). I wonder whether that's still in the TeX source too.

15:20:58 fittestbits So I'm thinking of not including callee preserves some of the registers. So, then the question becomes - how many scratch/temporary are really useful.

15:21:19 fittestbits Fewer registers means fewer bits in the instructions dedicated to register numbers.

15:21:43 beach I don't know what you mean by scratch/temporary. The register allocator should probably attempt to use all registers that are available.

15:22:03 beach Oh, you are designing the instruction set?

15:22:12 fittestbits Yes.

15:22:30 beach What/who is going to execute the instructions?

15:22:55 beach Sorry, you probably told us this already, but I can't remember.

15:23:08 fittestbits At them moment I'm just emulating them - would like to eventually try to get into a FPGA.

15:24:03 beach If you want performance, then a large number of registers is good. It looks to me like the RISC-V instruction layout is a good one.

15:25:13 Bike cleavir does the usual thing of pretending there are infinite registers while doing optimizations, and then leaving it up to the backend to sort things out.

15:25:15 beach So that would be 32 registers including register 0.

15:25:39 beach I forget whether PC is separate or one of them.

15:26:02 fittestbits Hmm, from the few code segments I've been working on, function calls show up pretty quickly, which means the registers would be overwritten.

15:26:22 beach Er, what?

15:26:39 beach Live registers should be saved to the stack before the call.

15:27:07 beach As in, all registers are "caller-saves".

15:28:22 fittestbits Exactly. Think of a high level instruction set - CAR, CDR are single instructions. Arithmetic instructions on fixnums are single instructions, with no SW checking of types.

15:28:50 Bike SW = software?

15:29:07 fittestbits And more complex arithmetic operations trapping into microcode, so again no software (SW) checking of types.

15:29:22 beach I would think that design would be orthogonal to register allocation, no?

15:30:02 fittestbits Yes, but, so far from what I've seen, I'm not seeing very many instructions between function calls.

15:30:17 fittestbits I guess I really need to widen the scope of the code I'm looking at.

15:30:41 beach With inlining, that number quickly grows.

15:31:46 fittestbits Yes, that's something I haven't thought about much ...

15:32:56 Bike i'd expect even less function calls than i'm used to seeing, even, if you don't need to call something to do type checks or arithmetic

15:33:02 Bike even fewer*

15:33:15 beach Yes, that seems plausible.

15:35:38 fittestbits I'll grab some code from SICL and compile it to show you what I'm seeing w.r.t. to function calls. Again, I want to say I've been looking at a pretty limited set of code examples.

15:35:45 fittestbits And I need to look at many more.

15:35:50 beach fittestbits: Let's put it this way. The number of registers of x86-64 is barely enough, and only if you don't segregate them the way SBCL does. I.e., you can store anything in any register.

15:36:32 beach fittestbits: Well, if you don't do any inlining, sure, there are going to be lots of them.

15:36:32 fittestbits My system has reached a point were I can look at a broader selection of code.

15:36:48 beach fittestbits: But inlining is key to good performance.

15:36:58 Bike i wonder if you could do something fancy with the call site optimization - save only registers that are both live and actually used by the callee

15:37:07 Bike probably would get tricky with asynchronous interrupts and stuff, though

15:37:37 beach Yeah, that sounds messy, but certainly possible.

15:39:03 beach fittestbits: Inlining is not just to avoid a function call. It also exposes callee code to optimization in the caller, and makes it easier to keep things in registers.

15:40:15 fittestbits Yes, I can see that ... What are you doing about changes to the callee? Keeping track of who all called it and recompiling them?

15:40:30 beach No, that's not required by the standard.

15:40:57 Bike also, a lot of the inlining is for standard functions that can't be redefined, anyway

15:41:07 beach Indeed.

15:41:19 Bike like arithmetic and array access and stuff, which are all functions, but which you probably want to avoid relegating to function calls in all cases

15:41:29 fittestbits A

15:41:56 fittestbits Oh yeah, for got to mention array references are instructions as well.

15:42:24 fittestbits So, maybe much of what you're thinking of inlining, I'm thinking of as microcode.

15:42:53 beach Maybe so, but then the result is not going to be very fast.

15:43:08 beach Because your microcode will always have to do the worst-case work.

15:43:25 beach Whereas with inlining, the compiler can optimize on a case-by-case basis.

15:44:13 beach For example, array access in a loop can do things like strength reduction, and elide boundary checks if they can be done by loop-invariant code.

15:45:29 fittestbits Good points. Thanks.

15:45:45 beach Sure.

15:53:14 fittestbits My interest is more towards security than raw performance. And it seems to me allowing pointer arithmetic at the compiled code level opens the door for all kinds of issues.

15:53:53 beach In that case, don't use a register allocator at all, and allocate everything on the stack.

15:54:28 beach You will have plenty of function calls as you pointed out, so there will be no reason to keep things in registers.

15:56:39 fittestbits The development does seem to be trending in that direction ... then the implementation would be to keep the top portion of the stack in "registers".

15:58:25 beach Well, if you have plenty of function calls, there might not be any point. You could then just use scratch registers for when an instruction requires an operand to be in a register, but you would always spill it to the stack after a mutation.

16:03:50 fittestbits This has all been very helpful. Again, thanks. I'll keep on trying various architectures to see where they go.

16:05:06 beach Pleasure. Good luck!

16:05:10 beach And keep us updated.

16:06:47 Bike i would be happy to have more cleavir backends

16:09:22 fittestbits I'm interested in using cleavir. Not sure how to get started, and I guess I'm not really ready for that anyway.

16:11:40 Bike a few months ago i whipped up an "example" system showing how to generate the IR, and then from there you have to translate that into whatever backend code

16:12:14 Bike lately i've been thinking of adding a backend in the form of the virtual machine we've been working on for clasp - excised of the few nonportable parts of course - which could then round out the example

16:15:40 beach Sounds great!

16:19:01 fittestbits An example would be good. A short doc about the steps you went through might be good as well.

16:20:43 Bike it's at https://github.com/s-expressionists/Cleavir/tree/main/Example

16:20:55 Bike there's a readme, but it's not really a full tutorial, unfortunately

16:27:51 Bike i have been pretty tied up with this vm stuff, but whenever we're finally done with that i'd like to finish the compiler reorganization i wrote about before, and then maybe shore up the documentation generally, since that'll leave me with a good overall idea of the organization of the system

18:40:50 moon-child hayley: 'table could be about as large as the heap' do it incrementally?