libera/#sicl - IRC Chatlog
Search
21:22:08
hayley
@[moon-child]: I don't think a fence has to fix anything, it just has to ensure that there are no replicated writes happening. If we are going lock-free we could use a counter of writes being performed, perhaps.
21:25:55
hayley
Remember that the write occurs to both replicas, so there is no need to fix anything eagerly. The fence just needs to ensure we can't "re-order" around the fence, which is handled by waiting for any writes to finish.
23:12:05
moon-child
hmm. It feels to me that it's better to shunt more work onto the 'owning' thread, since there can only be one of it. Also in the interest of fairness--if other threads are doing extra work only so that the owning thread can see a coherent state, but it never bothers to observe that state, then it's wasted work. But this argument may not hold water: if the overall amount of work done is still less,
23:12:52
moon-child
but something else to consider is that maintaining a counter creates 'false sharing'. If two different threads are diddling replicated objects that happen to come from the same nursery, there should be no issue, but they will have to play pingpong with that nursery's counter
23:14:30
moon-child
another thing: as there are not yet any cl concurrency semantics to adhere to, we don't necessarily have to support fences in the traditional sense. Eg (with-read-barriers ... (fence) ... (fence) ...)
23:18:03
moon-child
(& since the read barrier replaces pointers to the local replica with pointers to the global one, as I think it should, this has effects beyond the with-read-barriers block. That said, I'm not quite sure what a general version of that looks like, as it's rather tailored to this specific mechanism)
0:00:45
hayley
moon-child: If there is much contention on the counter, and thus many writes, it would be better to inform the thread whose nursery is being replicated to just do a minor GC, or pre-tenure harder.
0:01:54
hayley
That's an idea, if we track the probability that objects allocated at some site escape, we can adjust our threshold for directly allocating in the global heap.
0:09:43
moon-child
how would you tell if there is contention on the counter? I guess with hardware xadd, you can rdtsc before and after, and without, just check how many times you have to retry
0:10:32
hayley
To contend, there would have to be a lot of writes. But then we need another counter for that...
4:40:31
contrapunctus
scymtym: thanks for the example. Unfortunately, even after playing with it a fair bit, I'm still pretty perplexed. I tried adapting it for my use (using it to read its own source file and just collecting the output - https://paste.rs/qZf.lisp ), but the output does not contain the comments in the file. 🙁️
5:58:11
hayley
moon-child: How do you go about updating both locations in a lock free way? I know there's a more general double CAS procedure, but it would appear this case is somewhat easier.
5:59:23
hayley
We could also have another thread rewrite references in the local nursery to point into the global heap, but I don't know what we can achieve with that.
6:07:31
moon-child
if there are unsynchronised writes, it's ok if the state is incoherent, until you do something that synchronises with both of them. Then, a write can atomically exchange the new value into the global copy, and do a single attempt to cas it into the nursery copy. If the cas fails, then let the state be incoherent, but add the object to a to-be-cohered list. Then the next time you do something that
6:08:34
moon-child
then the issue is detecting what could synchronise with such writes. Fence is obvious, but I think atomic acq or acqrel does too
6:09:54
moon-child
otoh, since this only triggers when there is an actual race, it should be pretty rare. So maybe it's ok to have an extra branch for every acq/acqrel
6:14:42
moon-child
(where 'cohering' is just copying the contents of the global copy into the local copy. This is conservative--we do it just in case there was a write which future accesses to the local copy were synchronised with respect to. If those hypothetical accesses are _not_ synchronised wrt anything, and there were unsynchronised writes, then the program is inherently racy and we just happened to observe
6:19:52
hayley
My prior idea would still require a branch on every sequencing operation, too, to see if there's any updates that need to complete before we can continue.
6:21:19
moon-child
I don't think your idea would require branches for ordinary atomic accesses, only for blanket fences
6:21:46
hayley
I'd have to wonder how frequent writes to replicated objects are, because having a counter of replicated writes that are being made would be handy.
6:22:16
moon-child
since the lock ensures that the state of a replicated object never stays incoherent, and if you synchronise with somebody via acqrel, then that's it--you synchronised with them
6:23:11
moon-child
that said I think lock free is better. Following gil tene 'What could happen (and sneak in) if this one instruction takes 10 minutes to execute?'
6:28:38
hayley
I should write down the issues somewhere, because there are a few. We need to implement fences such that replicated writes complete either entirely before or after the fence. Writes also should never occur when a nursery collection is running; when a nursery collection starts, all mutators should only write to the global version (as, at that point, no one can observe the local version). Okay, I guess I counted one too many issues.
6:29:31
hayley
The latter might be a special case of the former; we might just need to disable replication, and then fence.
6:38:37
hayley
Okay, one more issue: how do you do an atomic update on a replicated slot, say, a compare-and-swap? The locking way provides sequential consistency whether one likes it or not.
6:39:36
moon-child
'Writes also should never occur when a nursery collection is running' with semispace, you just have to make sure the write finishes before the _next_ nursery collection finishes, which should be less work
6:40:12
moon-child
'atomic update on a replicates slot' assuming there are no flaws with my scheme for relaxed writes, I think it would work just the same
6:40:17
hayley
I should mention, this sort of copying applies to assigning unboxed values too, which will be "fun". But the static analysis to show that an object cannot become replicated is probably not too hard, so it might not matter too much.
6:42:44
moon-child
'assigning unboxed values' ugh ... hmm ... maybe it's ok to just eagerly pretenure all type-specialised arrays?
6:43:27
moon-child
after all, if you're using type-specialised arrays, it must be because you care about performance, and the performance advantage probably only starts to show once the array is reasonably large?
7:20:20
hayley
The Sapphire paper states "when the collector is done copying the (volatile field or fields of the) object, volatile accesses occur on the New copy of the field." i.e. there would be a similar barrier.
7:20:47
moon-child
'CASing on the global copy' I was imagining all write ops would take a write barrier, and if they find they're on the local copy, they first update their pointers to point at the global copy (just like with a concurrent gc)
7:21:06
moon-child
for ordinary reads you wouldn't need a barrier, but for atomic reads specifically, you would need the same barrier
7:23:20
hayley
You're probably right about pretenuring specialised arrays, as we don't gain much from moving them, and writes to such arrays cannot cause other objects to escape the nursery, either.
7:24:48
moon-child
hmm. Maybe actual ABA is rare. In that case, can we penalise the writer of the second A?
7:30:46
hayley
It would appear that Hudson and Moss don't think Sapphire would behave correctly when a program has a data race on a non-volatile field; seemingly they can't make the replicas coherent when two writers race to write replicated objects.
7:31:10
moon-child
wait, no, what am I thinking? There's no ABA problem. If the initial state is A, and t1 tries to write B, and t2 tries to write A, then we can have: t1 exchanges B into global, gets A; t2 exchanges A into global, gets B; t2 tries to cas A into local, but fails, because local wasn't B
7:32:18
moon-child
(I haven't proven that there can be no ABA issues, but this is the case that I was thinking of)
7:39:03
hayley
I'd hope still that replicated writes are the uncommon case, and if there are too many writes in a short period of time, a minor GC is performed instead to promote the objects more eagerly. The idea I had was just to reduce the number of times we need to force minor GC.
7:40:26
moon-child
I thought the point of replication was to deal with cases where pretenuring would pessimise, like mapcar
7:45:19
hayley
It's great that we have to treat this as a distributed system, even though this is a shared memory system, because of a distinct lack of transactional memory. /me continues grumbling
7:46:38
hayley
So now it's everything I dislike debugging in one: replication protocols, memory models, and garbage collection.