libera/#sicl - IRC Chatlog

15:24:45 beach That's a very good point.

15:32:55 Bike thought about it a bit more and i think compile/eval might make keeping source csts/forms in IR more essential. like, you can have (compile nil (make-lambda)), and then there may very well be no source text, but error messages still ought to be as helpful as possible under the circumstances

15:33:29 beach Makes sense.

19:23:23 hayley beach: In the case of thread local nurseries, I am not sure how you benefit from freeing on another thread, if the mutator thread is blocked. But you mention not having a write barrier, which implies not having a generational collector and thus no thread local nurseries.

19:28:32 hayley There are more recent results showing compaction improving locality, e.g. the many collectors compared in <https://dl.acm.org/doi/10.1145/1375581.1375586>.

19:31:51 hayley Re-linking the dyads does not appear free to me, but it can probably be done in an efficient manner. Also, I recall the paper on the Compressor argued that it had some locality, as either background threads would handle fixing up a page, or a mutator thread which trapped fixes it up just before using it.

19:34:11 hayley Amusingly I was thinking about how to implement a global backup compaction pass with little overhead when we don't compact, as no one seems to have investigated compaction for a persistent system like CLOSOS, which could run for years in one heap.

19:35:17 hayley Rather, they haven't investigated fragmentation over years. And I worry that it will add up over time.

19:39:13 hayley I had also thought the thread local nurseries were supposed to be about the size of L2 cache, which eliminates most cache misses.

19:41:13 hayley Another investigation in compacting i recall is <https://shipilev.net/jvm/anatomy-quarks/11-moving-gc-locality/> but it is a veryexaggerated test in my opinion.

19:43:17 hayley There, objects in an array are rearranged to be in the same order as the array, which improves the locality and performance of walking the array later.

20:00:01 hayley I guess that is too exaggerated, as GC costs aren't considered at all, and the micro-benchmark only uses one array. But then, I'd suspect the positioning of cons cells in a list also may cause programs to be more dependent on locality, than if they were written with vectors (or CDR coding, since I've already mentioned paging and MMU traps this morning, so why not?)

21:55:31 hayley That said, there is this paper on a Standard ML implementation using a mark-sweep nursery, which achieves results in the same ballpark as copying: <https://www.pllab.riec.tohoku.ac.jp/papers/icfp2011UenoOhoriOtomoAuthorVersion.pdf>

22:02:25 hayley However, they only test without multi-threading, and I suspect synchronisation costs are going to be important for a multi-threaded collector. But, as you say, you can avoid it by using allocation buffers of some sort. My current hunch is that compacting only when there are enough small holes is the best strategy.

22:05:39 hayley With regards to "What if we did not have per-thread nurseries" though: a thread local nursery requires no synchronisation and can fit in cache, so it should be a good idea based on those properties. We'd need to know how well utilised thread local nurseries are, and how slow the write barrier is, if we want to make a comparison. And, again, I think it would be very useful preliminary work to collate multi-threaded benchmarks which could be used in

22:05:39 hayley such tests.

22:13:16 hayley I'm also currently wondering if performing partial collections, like in the Mature Old Space (aka Train) could help with locality while running the garbage collector. Such a collector requires more a complex write barrier, but e.g. Garbage-First on a HotSpot JVM uses something similar, and was made the default collector in 2017.

22:29:11 zephyr i have been getting into GCs a lot at work lately, and the economy of them is pretty interesting. it's a little conspicuous there isn't "one true GC" and that their performance can vary so widely by workload

22:29:33 zephyr ZGC is the one i've been monitoring and using the most lately. weird thing.

23:06:19 hayley There are now generational versions of ZGC and Shenandoah which reduce the memory overhead in my experience (which consists of playing Minecraft).

23:06:43 hayley Well, both are in development and you need to build your own JVM, but they exist.

23:13:13 zephyr oh, interesting

23:17:34 hayley You could also get a free copy of the Azul JVM, which has a generational collector with basically the same design (C4). But I can't legally disclose my thoughts on it.

23:18:17 hayley (It's free as in it costs zero dollars, and proprietary still.)

0:22:53 moon-child oh there's generational zgc now!

2:34:02 moon-child hm, regarding off-thread freeing

2:34:41 moon-child it occurs to me that that would suffer from false sharing

2:35:55 moon-child you could probably reduce its incidence rate by divvying up your heap into regions. While allocating from region n, link up free cells from regions n+1 and n+2 (say)

2:37:00 moon-child and that would make some sense if you're sharding your freelists anyway (cf mimalloc)

2:37:33 moon-child however it would also come with some space overhead, and it wouldn't get rid of the problem entirely; the collector can still slow down the mutator even when the latter is trying to be clever