libera/#sicl - IRC Chatlog
Search
19:32:01
moon-child
first atomically exchange the new value with the old one, in the global replica. Then cas it into the nursery replica
19:35:14
moon-child
if a fence has to move everything that's replicated, that's still a win over doing it eagerly; tackle n objects at a time instead of 1. And if you have some atomic-heavy code with a bunch of conses, a bunch of pointing to those new conses from the global heap, and a bunch of fences, the pre-tenurer should learn about that pretty quickly
19:40:38
moon-child
hmm. If you include a generation number, then you don't need a cas loop, just one cas. But you bloat your objects, and have to fight over the generation number, so maybe it's a wash
21:22:08
hayley
@[moon-child]: I don't think a fence has to fix anything, it just has to ensure that there are no replicated writes happening. If we are going lock-free we could use a counter of writes being performed, perhaps.
21:25:55
hayley
Remember that the write occurs to both replicas, so there is no need to fix anything eagerly. The fence just needs to ensure we can't "re-order" around the fence, which is handled by waiting for any writes to finish.
23:12:05
moon-child
hmm. It feels to me that it's better to shunt more work onto the 'owning' thread, since there can only be one of it. Also in the interest of fairness--if other threads are doing extra work only so that the owning thread can see a coherent state, but it never bothers to observe that state, then it's wasted work. But this argument may not hold water: if the overall amount of work done is still less,
23:12:52
moon-child
but something else to consider is that maintaining a counter creates 'false sharing'. If two different threads are diddling replicated objects that happen to come from the same nursery, there should be no issue, but they will have to play pingpong with that nursery's counter
23:14:30
moon-child
another thing: as there are not yet any cl concurrency semantics to adhere to, we don't necessarily have to support fences in the traditional sense. Eg (with-read-barriers ... (fence) ... (fence) ...)
23:18:03
moon-child
(& since the read barrier replaces pointers to the local replica with pointers to the global one, as I think it should, this has effects beyond the with-read-barriers block. That said, I'm not quite sure what a general version of that looks like, as it's rather tailored to this specific mechanism)
0:00:45
hayley
moon-child: If there is much contention on the counter, and thus many writes, it would be better to inform the thread whose nursery is being replicated to just do a minor GC, or pre-tenure harder.
0:01:54
hayley
That's an idea, if we track the probability that objects allocated at some site escape, we can adjust our threshold for directly allocating in the global heap.
0:09:43
moon-child
how would you tell if there is contention on the counter? I guess with hardware xadd, you can rdtsc before and after, and without, just check how many times you have to retry
0:10:32
hayley
To contend, there would have to be a lot of writes. But then we need another counter for that...
4:40:31
contrapunctus
scymtym: thanks for the example. Unfortunately, even after playing with it a fair bit, I'm still pretty perplexed. I tried adapting it for my use (using it to read its own source file and just collecting the output - https://paste.rs/qZf.lisp ), but the output does not contain the comments in the file. 🙁️
5:58:11
hayley
moon-child: How do you go about updating both locations in a lock free way? I know there's a more general double CAS procedure, but it would appear this case is somewhat easier.
5:59:23
hayley
We could also have another thread rewrite references in the local nursery to point into the global heap, but I don't know what we can achieve with that.
6:07:31
moon-child
if there are unsynchronised writes, it's ok if the state is incoherent, until you do something that synchronises with both of them. Then, a write can atomically exchange the new value into the global copy, and do a single attempt to cas it into the nursery copy. If the cas fails, then let the state be incoherent, but add the object to a to-be-cohered list. Then the next time you do something that
6:08:34
moon-child
then the issue is detecting what could synchronise with such writes. Fence is obvious, but I think atomic acq or acqrel does too
6:09:54
moon-child
otoh, since this only triggers when there is an actual race, it should be pretty rare. So maybe it's ok to have an extra branch for every acq/acqrel
6:14:42
moon-child
(where 'cohering' is just copying the contents of the global copy into the local copy. This is conservative--we do it just in case there was a write which future accesses to the local copy were synchronised with respect to. If those hypothetical accesses are _not_ synchronised wrt anything, and there were unsynchronised writes, then the program is inherently racy and we just happened to observe
6:19:52
hayley
My prior idea would still require a branch on every sequencing operation, too, to see if there's any updates that need to complete before we can continue.
6:21:19
moon-child
I don't think your idea would require branches for ordinary atomic accesses, only for blanket fences
6:21:46
hayley
I'd have to wonder how frequent writes to replicated objects are, because having a counter of replicated writes that are being made would be handy.
6:22:16
moon-child
since the lock ensures that the state of a replicated object never stays incoherent, and if you synchronise with somebody via acqrel, then that's it--you synchronised with them
6:23:11
moon-child
that said I think lock free is better. Following gil tene 'What could happen (and sneak in) if this one instruction takes 10 minutes to execute?'
6:28:38
hayley
I should write down the issues somewhere, because there are a few. We need to implement fences such that replicated writes complete either entirely before or after the fence. Writes also should never occur when a nursery collection is running; when a nursery collection starts, all mutators should only write to the global version (as, at that point, no one can observe the local version). Okay, I guess I counted one too many issues.
6:29:31
hayley
The latter might be a special case of the former; we might just need to disable replication, and then fence.
6:38:37
hayley
Okay, one more issue: how do you do an atomic update on a replicated slot, say, a compare-and-swap? The locking way provides sequential consistency whether one likes it or not.
6:39:36
moon-child
'Writes also should never occur when a nursery collection is running' with semispace, you just have to make sure the write finishes before the _next_ nursery collection finishes, which should be less work
6:40:12
moon-child
'atomic update on a replicates slot' assuming there are no flaws with my scheme for relaxed writes, I think it would work just the same
6:40:17
hayley
I should mention, this sort of copying applies to assigning unboxed values too, which will be "fun". But the static analysis to show that an object cannot become replicated is probably not too hard, so it might not matter too much.
6:42:44
moon-child
'assigning unboxed values' ugh ... hmm ... maybe it's ok to just eagerly pretenure all type-specialised arrays?
6:43:27
moon-child
after all, if you're using type-specialised arrays, it must be because you care about performance, and the performance advantage probably only starts to show once the array is reasonably large?
7:20:20
hayley
The Sapphire paper states "when the collector is done copying the (volatile field or fields of the) object, volatile accesses occur on the New copy of the field." i.e. there would be a similar barrier.