libera/#sicl - IRC Chatlog

19:31:07 moon-child hayley: figured out how to handle the concurrent replicated writes

19:31:27 moon-child no locks

19:32:01 moon-child first atomically exchange the new value with the old one, in the global replica. Then cas it into the nursery replica

19:32:19 moon-child cas-loop, I guess

19:34:06 moon-child (for a second, I thought you could get away with a single cas, but no)

19:35:14 moon-child if a fence has to move everything that's replicated, that's still a win over doing it eagerly; tackle n objects at a time instead of 1. And if you have some atomic-heavy code with a bunch of conses, a bunch of pointing to those new conses from the global heap, and a bunch of fences, the pre-tenurer should learn about that pretty quickly

19:40:38 moon-child hmm. If you include a generation number, then you don't need a cas loop, just one cas. But you bloat your objects, and have to fight over the generation number, so maybe it's a wash

21:22:08 hayley @[moon-child]: I don't think a fence has to fix anything, it just has to ensure that there are no replicated writes happening. If we are going lock-free we could use a counter of writes being performed, perhaps.

21:25:55 hayley Remember that the write occurs to both replicas, so there is no need to fix anything eagerly. The fence just needs to ensure we can't "re-order" around the fence, which is handled by waiting for any writes to finish.

23:12:05 moon-child hmm. It feels to me that it's better to shunt more work onto the 'owning' thread, since there can only be one of it. Also in the interest of fairness--if other threads are doing extra work only so that the owning thread can see a coherent state, but it never bothers to observe that state, then it's wasted work. But this argument may not hold water: if the overall amount of work done is still less,

23:12:06 moon-child on average, then it's still a win

23:12:52 moon-child but something else to consider is that maintaining a counter creates 'false sharing'. If two different threads are diddling replicated objects that happen to come from the same nursery, there should be no issue, but they will have to play pingpong with that nursery's counter

23:13:10 hayley Indeed.

23:14:30 moon-child another thing: as there are not yet any cl concurrency semantics to adhere to, we don't necessarily have to support fences in the traditional sense. Eg (with-read-barriers ... (fence) ... (fence) ...)

23:18:03 moon-child (& since the read barrier replaces pointers to the local replica with pointers to the global one, as I think it should, this has effects beyond the with-read-barriers block. That said, I'm not quite sure what a general version of that looks like, as it's rather tailored to this specific mechanism)

0:00:45 hayley moon-child: If there is much contention on the counter, and thus many writes, it would be better to inform the thread whose nursery is being replicated to just do a minor GC, or pre-tenure harder.

0:01:54 hayley That's an idea, if we track the probability that objects allocated at some site escape, we can adjust our threshold for directly allocating in the global heap.

0:09:43 moon-child how would you tell if there is contention on the counter? I guess with hardware xadd, you can rdtsc before and after, and without, just check how many times you have to retry

0:09:49 moon-child but it would be nicer not to have to contend at all

0:10:32 hayley To contend, there would have to be a lot of writes. But then we need another counter for that...

0:10:52 moon-child if the writes are separated in time, then there is no problem

3:00:05 beach Good morning everyone!

3:06:24 Bike morning.

3:11:17 hayley Good morning beach!

4:40:31 contrapunctus scymtym: thanks for the example. Unfortunately, even after playing with it a fair bit, I'm still pretty perplexed. I tried adapting it for my use (using it to read its own source file and just collecting the output - https://paste.rs/qZf.lisp ), but the output does not contain the comments in the file. 🙁️

5:58:11 hayley moon-child: How do you go about updating both locations in a lock free way? I know there's a more general double CAS procedure, but it would appear this case is somewhat easier.

5:59:23 hayley We could also have another thread rewrite references in the local nursery to point into the global heap, but I don't know what we can achieve with that.

6:04:07 moon-child oh, hmmm

6:04:27 moon-child oh, no...

6:05:13 moon-child maybe?

6:07:31 moon-child if there are unsynchronised writes, it's ok if the state is incoherent, until you do something that synchronises with both of them. Then, a write can atomically exchange the new value into the global copy, and do a single attempt to cas it into the nursery copy. If the cas fails, then let the state be incoherent, but add the object to a to-be-cohered list. Then the next time you do something that

6:07:33 moon-child could synchronise with those writes, you clear out your to-be-cohered list

6:08:34 moon-child then the issue is detecting what could synchronise with such writes. Fence is obvious, but I think atomic acq or acqrel does too

6:09:54 moon-child otoh, since this only triggers when there is an actual race, it should be pretty rare. So maybe it's ok to have an extra branch for every acq/acqrel

6:14:42 moon-child (where 'cohering' is just copying the contents of the global copy into the local copy. This is conservative--we do it just in case there was a write which future accesses to the local copy were synchronised with respect to. If those hypothetical accesses are _not_ synchronised wrt anything, and there were unsynchronised writes, then the program is inherently racy and we just happened to observe

6:14:45 moon-child those writes then)

6:17:10 hayley Alright. I'll have to think about it, memory models hurt my head.

6:17:45 moon-child yeah, me too

6:19:52 hayley My prior idea would still require a branch on every sequencing operation, too, to see if there's any updates that need to complete before we can continue.

6:21:19 moon-child I don't think your idea would require branches for ordinary atomic accesses, only for blanket fences

6:21:46 hayley I'd have to wonder how frequent writes to replicated objects are, because having a counter of replicated writes that are being made would be handy.

6:21:59 moon-child or, hmm, maybe for seqcst accesses, but not for acq/rel/acqrel

6:22:16 moon-child since the lock ensures that the state of a replicated object never stays incoherent, and if you synchronise with somebody via acqrel, then that's it--you synchronised with them

6:23:11 moon-child that said I think lock free is better. Following gil tene 'What could happen (and sneak in) if this one instruction takes 10 minutes to execute?'

6:23:42 hayley You're just saying that because I mentioned it in #lispcafe. Dammit.

6:24:21 moon-child :)

6:28:38 hayley I should write down the issues somewhere, because there are a few. We need to implement fences such that replicated writes complete either entirely before or after the fence. Writes also should never occur when a nursery collection is running; when a nursery collection starts, all mutators should only write to the global version (as, at that point, no one can observe the local version). Okay, I guess I counted one too many issues.

6:29:31 hayley The latter might be a special case of the former; we might just need to disable replication, and then fence.

6:38:37 hayley Okay, one more issue: how do you do an atomic update on a replicated slot, say, a compare-and-swap? The locking way provides sequential consistency whether one likes it or not.

6:39:36 moon-child 'Writes also should never occur when a nursery collection is running' with semispace, you just have to make sure the write finishes before the _next_ nursery collection finishes, which should be less work

6:40:12 moon-child 'atomic update on a replicates slot' assuming there are no flaws with my scheme for relaxed writes, I think it would work just the same

6:40:17 hayley I should mention, this sort of copying applies to assigning unboxed values too, which will be "fun". But the static analysis to show that an object cannot become replicated is probably not too hard, so it might not matter too much.

6:40:20 moon-child you would need a read barrier for atomic reads, but that's ok

6:42:24 moon-child (err, before the next nursery collection starts)

6:42:44 moon-child 'assigning unboxed values' ugh ... hmm ... maybe it's ok to just eagerly pretenure all type-specialised arrays?

6:43:27 moon-child after all, if you're using type-specialised arrays, it must be because you care about performance, and the performance advantage probably only starts to show once the array is reasonably large?

6:43:36 moon-child I guess you can just hoist the check. Shouldn't be too bad

7:18:33 hayley ACTION nods

7:18:50 hayley The read barrier is to make sure we're CASing on the global copy, I guess?

7:19:31 moon-child gah!

7:19:37 moon-child my cas idea has aba problems

7:19:42 moon-child dammit

7:20:20 hayley The Sapphire paper states "when the collector is done copying the (volatile field or fields of the) object, volatile accesses occur on the New copy of the field." i.e. there would be a similar barrier.