libera/#sicl - IRC Chatlog
Search
15:32:55
Bike
thought about it a bit more and i think compile/eval might make keeping source csts/forms in IR more essential. like, you can have (compile nil (make-lambda)), and then there may very well be no source text, but error messages still ought to be as helpful as possible under the circumstances
19:23:23
hayley
beach: In the case of thread local nurseries, I am not sure how you benefit from freeing on another thread, if the mutator thread is blocked. But you mention not having a write barrier, which implies not having a generational collector and thus no thread local nurseries.
19:28:32
hayley
There are more recent results showing compaction improving locality, e.g. the many collectors compared in <https://dl.acm.org/doi/10.1145/1375581.1375586>.
19:31:51
hayley
Re-linking the dyads does not appear free to me, but it can probably be done in an efficient manner. Also, I recall the paper on the Compressor argued that it had some locality, as either background threads would handle fixing up a page, or a mutator thread which trapped fixes it up just before using it.
19:34:11
hayley
Amusingly I was thinking about how to implement a global backup compaction pass with little overhead when we don't compact, as no one seems to have investigated compaction for a persistent system like CLOSOS, which could run for years in one heap.
19:35:17
hayley
Rather, they haven't investigated fragmentation over years. And I worry that it will add up over time.
19:39:13
hayley
I had also thought the thread local nurseries were supposed to be about the size of L2 cache, which eliminates most cache misses.
19:41:13
hayley
Another investigation in compacting i recall is <https://shipilev.net/jvm/anatomy-quarks/11-moving-gc-locality/> but it is a veryexaggerated test in my opinion.
19:43:17
hayley
There, objects in an array are rearranged to be in the same order as the array, which improves the locality and performance of walking the array later.
20:00:01
hayley
I guess that is too exaggerated, as GC costs aren't considered at all, and the micro-benchmark only uses one array. But then, I'd suspect the positioning of cons cells in a list also may cause programs to be more dependent on locality, than if they were written with vectors (or CDR coding, since I've already mentioned paging and MMU traps this morning, so why not?)
21:55:31
hayley
That said, there is this paper on a Standard ML implementation using a mark-sweep nursery, which achieves results in the same ballpark as copying: <https://www.pllab.riec.tohoku.ac.jp/papers/icfp2011UenoOhoriOtomoAuthorVersion.pdf>
22:02:25
hayley
However, they only test without multi-threading, and I suspect synchronisation costs are going to be important for a multi-threaded collector. But, as you say, you can avoid it by using allocation buffers of some sort. My current hunch is that compacting only when there are enough small holes is the best strategy.
22:05:39
hayley
With regards to "What if we did not have per-thread nurseries" though: a thread local nursery requires no synchronisation and can fit in cache, so it should be a good idea based on those properties. We'd need to know how well utilised thread local nurseries are, and how slow the write barrier is, if we want to make a comparison. And, again, I think it would be very useful preliminary work to collate multi-threaded benchmarks which could be used in
22:13:16
hayley
I'm also currently wondering if performing partial collections, like in the Mature Old Space (aka Train) could help with locality while running the garbage collector. Such a collector requires more a complex write barrier, but e.g. Garbage-First on a HotSpot JVM uses something similar, and was made the default collector in 2017.
22:29:11
zephyr
i have been getting into GCs a lot at work lately, and the economy of them is pretty interesting. it's a little conspicuous there isn't "one true GC" and that their performance can vary so widely by workload
23:06:19
hayley
There are now generational versions of ZGC and Shenandoah which reduce the memory overhead in my experience (which consists of playing Minecraft).
23:17:34
hayley
You could also get a free copy of the Azul JVM, which has a generational collector with basically the same design (C4). But I can't legally disclose my thoughts on it.
2:35:55
moon-child
you could probably reduce its incidence rate by divvying up your heap into regions. While allocating from region n, link up free cells from regions n+1 and n+2 (say)
2:37:00
moon-child
and that would make some sense if you're sharding your freelists anyway (cf mimalloc)
2:37:33
moon-child
however it would also come with some space overhead, and it wouldn't get rid of the problem entirely; the collector can still slow down the mutator even when the latter is trying to be clever