libera/#sicl - IRC Chatlog

21:22:08 hayley @[moon-child]: I don't think a fence has to fix anything, it just has to ensure that there are no replicated writes happening. If we are going lock-free we could use a counter of writes being performed, perhaps.

21:25:55 hayley Remember that the write occurs to both replicas, so there is no need to fix anything eagerly. The fence just needs to ensure we can't "re-order" around the fence, which is handled by waiting for any writes to finish.

23:12:05 moon-child hmm. It feels to me that it's better to shunt more work onto the 'owning' thread, since there can only be one of it. Also in the interest of fairness--if other threads are doing extra work only so that the owning thread can see a coherent state, but it never bothers to observe that state, then it's wasted work. But this argument may not hold water: if the overall amount of work done is still less,

23:12:06 moon-child on average, then it's still a win

23:12:52 moon-child but something else to consider is that maintaining a counter creates 'false sharing'. If two different threads are diddling replicated objects that happen to come from the same nursery, there should be no issue, but they will have to play pingpong with that nursery's counter

23:13:10 hayley Indeed.

23:14:30 moon-child another thing: as there are not yet any cl concurrency semantics to adhere to, we don't necessarily have to support fences in the traditional sense. Eg (with-read-barriers ... (fence) ... (fence) ...)

23:18:03 moon-child (& since the read barrier replaces pointers to the local replica with pointers to the global one, as I think it should, this has effects beyond the with-read-barriers block. That said, I'm not quite sure what a general version of that looks like, as it's rather tailored to this specific mechanism)

0:00:45 hayley moon-child: If there is much contention on the counter, and thus many writes, it would be better to inform the thread whose nursery is being replicated to just do a minor GC, or pre-tenure harder.

0:01:54 hayley That's an idea, if we track the probability that objects allocated at some site escape, we can adjust our threshold for directly allocating in the global heap.

0:09:43 moon-child how would you tell if there is contention on the counter? I guess with hardware xadd, you can rdtsc before and after, and without, just check how many times you have to retry

0:09:49 moon-child but it would be nicer not to have to contend at all

0:10:32 hayley To contend, there would have to be a lot of writes. But then we need another counter for that...

0:10:52 moon-child if the writes are separated in time, then there is no problem

3:00:05 beach Good morning everyone!

3:06:24 Bike morning.

3:11:17 hayley Good morning beach!

4:40:31 contrapunctus scymtym: thanks for the example. Unfortunately, even after playing with it a fair bit, I'm still pretty perplexed. I tried adapting it for my use (using it to read its own source file and just collecting the output - https://paste.rs/qZf.lisp ), but the output does not contain the comments in the file. 🙁️

5:58:11 hayley moon-child: How do you go about updating both locations in a lock free way? I know there's a more general double CAS procedure, but it would appear this case is somewhat easier.

5:59:23 hayley We could also have another thread rewrite references in the local nursery to point into the global heap, but I don't know what we can achieve with that.

6:04:07 moon-child oh, hmmm

6:04:27 moon-child oh, no...

6:05:13 moon-child maybe?

6:07:31 moon-child if there are unsynchronised writes, it's ok if the state is incoherent, until you do something that synchronises with both of them. Then, a write can atomically exchange the new value into the global copy, and do a single attempt to cas it into the nursery copy. If the cas fails, then let the state be incoherent, but add the object to a to-be-cohered list. Then the next time you do something that

6:07:33 moon-child could synchronise with those writes, you clear out your to-be-cohered list

6:08:34 moon-child then the issue is detecting what could synchronise with such writes. Fence is obvious, but I think atomic acq or acqrel does too

6:09:54 moon-child otoh, since this only triggers when there is an actual race, it should be pretty rare. So maybe it's ok to have an extra branch for every acq/acqrel

6:14:42 moon-child (where 'cohering' is just copying the contents of the global copy into the local copy. This is conservative--we do it just in case there was a write which future accesses to the local copy were synchronised with respect to. If those hypothetical accesses are _not_ synchronised wrt anything, and there were unsynchronised writes, then the program is inherently racy and we just happened to observe

6:14:45 moon-child those writes then)

6:17:10 hayley Alright. I'll have to think about it, memory models hurt my head.

6:17:45 moon-child yeah, me too

6:19:52 hayley My prior idea would still require a branch on every sequencing operation, too, to see if there's any updates that need to complete before we can continue.

6:21:19 moon-child I don't think your idea would require branches for ordinary atomic accesses, only for blanket fences

6:21:46 hayley I'd have to wonder how frequent writes to replicated objects are, because having a counter of replicated writes that are being made would be handy.

6:21:59 moon-child or, hmm, maybe for seqcst accesses, but not for acq/rel/acqrel

6:22:16 moon-child since the lock ensures that the state of a replicated object never stays incoherent, and if you synchronise with somebody via acqrel, then that's it--you synchronised with them

6:23:11 moon-child that said I think lock free is better. Following gil tene 'What could happen (and sneak in) if this one instruction takes 10 minutes to execute?'

6:23:42 hayley You're just saying that because I mentioned it in #lispcafe. Dammit.

6:24:21 moon-child :)

6:28:38 hayley I should write down the issues somewhere, because there are a few. We need to implement fences such that replicated writes complete either entirely before or after the fence. Writes also should never occur when a nursery collection is running; when a nursery collection starts, all mutators should only write to the global version (as, at that point, no one can observe the local version). Okay, I guess I counted one too many issues.

6:29:31 hayley The latter might be a special case of the former; we might just need to disable replication, and then fence.

6:38:37 hayley Okay, one more issue: how do you do an atomic update on a replicated slot, say, a compare-and-swap? The locking way provides sequential consistency whether one likes it or not.

6:39:36 moon-child 'Writes also should never occur when a nursery collection is running' with semispace, you just have to make sure the write finishes before the _next_ nursery collection finishes, which should be less work

6:40:12 moon-child 'atomic update on a replicates slot' assuming there are no flaws with my scheme for relaxed writes, I think it would work just the same

6:40:17 hayley I should mention, this sort of copying applies to assigning unboxed values too, which will be "fun". But the static analysis to show that an object cannot become replicated is probably not too hard, so it might not matter too much.

6:40:20 moon-child you would need a read barrier for atomic reads, but that's ok

6:42:24 moon-child (err, before the next nursery collection starts)

6:42:44 moon-child 'assigning unboxed values' ugh ... hmm ... maybe it's ok to just eagerly pretenure all type-specialised arrays?

6:43:27 moon-child after all, if you're using type-specialised arrays, it must be because you care about performance, and the performance advantage probably only starts to show once the array is reasonably large?

6:43:36 moon-child I guess you can just hoist the check. Shouldn't be too bad

7:18:33 hayley ACTION nods

7:18:50 hayley The read barrier is to make sure we're CASing on the global copy, I guess?

7:19:31 moon-child gah!

7:19:37 moon-child my cas idea has aba problems

7:19:42 moon-child dammit

7:20:20 hayley The Sapphire paper states "when the collector is done copying the (volatile field or fields of the) object, volatile accesses occur on the New copy of the field." i.e. there would be a similar barrier.

7:20:47 moon-child 'CASing on the global copy' I was imagining all write ops would take a write barrier, and if they find they're on the local copy, they first update their pointers to point at the global copy (just like with a concurrent gc)

7:21:06 moon-child for ordinary reads you wouldn't need a barrier, but for atomic reads specifically, you would need the same barrier

7:23:20 hayley You're probably right about pretenuring specialised arrays, as we don't gain much from moving them, and writes to such arrays cannot cause other objects to escape the nursery, either.

7:23:21 hayley Ouch ):

7:24:48 moon-child hmm. Maybe actual ABA is rare. In that case, can we penalise the writer of the second A?

7:25:51 moon-child maybe keep a small ring buffer of as-yet-unflushed writes?

7:30:46 hayley It would appear that Hudson and Moss don't think Sapphire would behave correctly when a program has a data race on a non-volatile field; seemingly they can't make the replicas coherent when two writers race to write replicated objects.

7:31:10 moon-child wait, no, what am I thinking? There's no ABA problem. If the initial state is A, and t1 tries to write B, and t2 tries to write A, then we can have: t1 exchanges B into global, gets A; t2 exchanges A into global, gets B; t2 tries to cas A into local, but fails, because local wasn't B

7:31:18 moon-child so _t2_ can be the one to mark the object as incoherent

7:32:18 moon-child (I haven't proven that there can be no ABA issues, but this is the case that I was thinking of)

7:39:03 hayley I'd hope still that replicated writes are the uncommon case, and if there are too many writes in a short period of time, a minor GC is performed instead to promote the objects more eagerly. The idea I had was just to reduce the number of times we need to force minor GC.

7:40:26 moon-child I thought the point of replication was to deal with cases where pretenuring would pessimise, like mapcar

7:40:47 hayley Right, that too, I suppose.

7:41:08 hayley Who'd have thunk splitting the heap would be hard?

7:41:28 moon-child see, this is why I don't even _try_ to do distributed systems :)

7:45:19 hayley It's great that we have to treat this as a distributed system, even though this is a shared memory system, because of a distinct lack of transactional memory. /me continues grumbling

7:46:38 hayley So now it's everything I dislike debugging in one: replication protocols, memory models, and garbage collection.