libera/#sicl - IRC Chatlog

6:37:32 hayley On the previous topic of free-list versus sequential allocation, the Garbage Collection Handbook states "Blackburn et al [in <https://users.cecs.anu.edu.au/~steveb/pubs/papers/mmtk-sigmetrics-2004.pdf>] showed that the difference in cost between sequential and free-list allocation is small (accounting for only 1% of total execution time) and is dominated by the second order effect of improved locality, particularly for young objects which benefit

6:37:32 hayley from being laid out in allocation order."

6:38:35 beach I see.

6:39:11 beach But if "objects that are allocated together die together", then they will be sequential also with a free list.

6:43:00 beach Wait, isn't there a "prefetch" instruction in modern processors?

6:43:05 hayley Right, yes. Could a small proportion of objects that are allocated together not dying together make allocation with a free list less sequential? I haven't gotten around to writing a non-sequential allocator for practise.

6:43:19 hayley There is, yes.

6:43:51 beach The argument was that locality improves performance because the processor will prefetch the next words in memory, right?

6:44:21 moon-child 'they will be sequential also with a free list' not necessarily. Suppose I allocate 6 conses 0123456789, and then free them all. Then I allocate a list of 3 cons cells X (012), a list of 3 cons cells Y (345). then Y is freed. then x is freed. Now the freelist is 345012678. Then I allocate a length-9 list, and it is fragmented

6:44:48 moon-child beach: explicit prefetch does work, ocaml people found it improved the performance of list iteration, but it is not a panacea

6:44:58 hayley Right again. I heard OCaml does some tricks with prefetching over list performance.

6:45:28 moon-child (err, allocate 9 conses initially)

6:45:46 hayley *prefetching over list iteration - moon-child raced with me, and I ended up reading bogus mental data.

6:46:29 moon-child ACTION hands hayley a lock

6:47:51 hayley (Java also prefetches after performing an allocation, which improves allocation performance, and prefetching would probably also work to speed up allocation. But Blackburn et al found that changes in allocation throughput did not affect application throughput much, as little time is spent allocating.)

6:51:43 hayley Is the PREFETCH<hint> instruction on x86-64 specific to any extension?

6:54:11 moon-child prefetchw is, the others seem not to be

6:54:23 moon-child but, I thought the plan was to depend on avx2 anyway, for the three-address float ops?

6:55:30 moon-child ah, hm, amd manual says there's a cpuid flag for prefetchtn too

6:55:34 hayley I'm going to play around with prefetching after allocation in SBCL, and I don't think they require AVX2.

6:56:39 moon-child could also just emit the ops conditionally

6:57:33 moon-child (nb. seems I misread the amd manual. There an amd-specific instruction, but the normal prefetches are always there)

6:58:15 hayley According to <https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/> HotSpot uses PREFETCHNTA, but I'll just test with PREFETCHT0 for now.

7:12:48 hayley I can't tell if any prefetch instructions help in SBCL just yet. Maybe I'll run cl-bench and find out that it again runs 0.7% faster on average, with no noticeable patterns in which benchmarks run faster. :)

7:14:43 beach hayley: What cost did Blackburn et al attribute to copying objects to improve locality?

7:19:16 hayley A non-generational copying collector takes more time to collect than a non-generational mark-sweep collector, unless a very large heap is used, but a generational collector with a copying nursery and mark-sweep collector for matured space performs the best.

7:21:08 hayley Notably "cache measurements reveal that the spatial locality of objects allocated close together in time is key for nursery objects, but not as important for mature objects."

7:22:56 beach I see.

7:23:17 beach So no real measurements.

7:24:34 moon-child 'not as important for mature objects' probably just that there are fewer of them

7:24:41 hayley My first statement was my interpretation of the graphs in Figure 1 on page 7 of that paper. I think those are measurements.

7:24:48 moon-child hm, no, that doesn't follow

7:25:09 moon-child it doesn't follow from the generational hypothesis that most accesses are to young objects rather than old ones

7:26:51 hayley Does it follow from the "most writes aren't dead" hypothesis, which came up when we were discussing what Gil Tene told me about read barriers? If most objects die young, and most objects are used somehow...

7:28:21 moon-child no. Imagine the degenerate case of a program which never allocates at all

7:28:38 hayley Right.

7:32:40 hayley From the conclusion: "As a corollary, although many accesses go to mature objects, their performance relies on temporal locality, whereas in the nursery, allocation order provides good spatial locality for young objects that die quickly."

7:34:31 moon-child makes sense

7:35:02 hayley I guess, if most objects die young, then the temporal locality of accessing young objects may be poor. And there are measurements of the ages of accessed objects, and of the distribution of accesses to objects in Table 1 on page 5.

7:41:14 hayley For what it's worth, I wouldn't be opposed to having CAS with an equality predicate other than EQ produce a loop, as it shouldn't affect lock-freedom, assuming...some sort of fairness for which thread "wins" a compare-and-swap.

9:26:54 White__Flame ** NICK White_Flame

9:46:45 moon-child https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf

9:48:42 moon-child rebuttal to phk/varnish, I suppose :P

11:23:44 hayley The mmap paper?

11:23:51 moon-child yes

11:24:16 hayley ACTION continues her blissful ignorance, as mmaping is fun

11:25:56 moon-child I more think it could be useful in designing closos persistence layer

11:28:38 hayley So, the main arguments are that it is difficult to implement atomic updates with mmap, and that TLB shootdowns can be slower than syscalls?

11:31:31 hayley There is a similar problem to the latter when implementing concurrent garbage collectors using hardware protection, wherein many mprotect calls are made in a batch, but the operating system flushes(?) TLBs after every call. And I think there is also a tradeoff with calling mmap in a grep implementation too.

11:31:51 moon-child also no prefetch

11:32:11 hayley Conventional wisdom for the latter is to read whole files into memory if they are small, and mmap if they are large.

11:32:46 hayley Isn't madvise with the sequential option supposed to do (linear) prefetching? But sure about any other patterns.

11:32:57 moon-child sure, linear prefetching is easy

11:33:28 moon-child but when you know what you're going to look at, you can do better. They give an example in the paper. See also ocaml list iteration, mentioned recently :)

11:34:33 hayley Can we just have Multics back?

11:35:04 hayley (I say this like I have used Multics before. But this is #sicl, so it's funny.)

11:36:26 moon-child but atomicity&transactions are the main thing I worry about. Was thinking about them a bit ago too. I have no idea how you make that work well in a large system. Databases tend to be self-contained, but you ideally also want to maintain coherent inter-process state too

11:38:32 hayley I've discussed it with someone before, and we thought to allow programs to force the supervisor to generate a checkpoint, which would ensure durability (insofar as the hardware is durable too). The other three parts of ACID are harder, but we can just use software transactional memory.

11:39:09 hayley Well, not that easy for atomicity...

11:40:38 hayley If you checkpoint threads too, and the supervisor automatically generates a checkpoint while a transaction is being produced, then I would expect that threads will roll forward as usual after a crash, unless some non-determinism influences execution.

11:47:55 moon-child main thing is--you might like to take a partial checkpoint

11:48:03 moon-child but you don't know which parts of the system you can skip

11:48:46 hayley Right. Such is life for seas of objects.

11:49:07 moon-child ACTION links whitney paper