libera/#sicl - IRC Chatlog

7:27:25 hayley moon-child: Do you know any literature on double-CAS or otherwise doing writes to two replicas atomically?

7:29:57 moon-child I haven't read it, but someone

7:30:03 moon-child linked https://www.cl.cam.ac.uk/research/srg/netos/papers/2002-casn.pdf a while ago

7:32:14 moon-child 'reserves a small and constant amount of space in each word updated (either 0 or 2 bits)' can they be spared? If not, maybe an out-of-line bitmap

7:32:23 hayley One other complication is that writes to global objects from other threads would require a check, to see if we need to replicate the write to another nursery...and we'd need to store that nursery too, oh dear. It's also necessary to have some sort of synchronisation with the other nursery, so that doing the atomic update won't race with compaction. I guess we could have a hash table of local originals of global replicas, a bit like lazy promotion

7:32:23 hayley for address hashing.

7:33:09 hayley The relevant tables would be very sparse, I think.

7:33:40 moon-child that said--I guess the idea is something gets copied to the global heap, but a copy stays in t1's nursery, and then t2 tries to modify it? That seems fraught

7:33:51 hayley Yep, that's the idea.

7:35:13 hayley We double-CAS to keep the two replicas synchronised. If there are too many DCAS updates, we suck it up and do a nursery collection, to remove replicas.

7:35:56 moon-child that write constitutes a concurrent modification. If it isn't done with any strong or atomic memory ordering, I think the memory model may allow the state to be incoherent

7:36:37 moon-child (or, rather, it does if t1 is writing the object at the same time, which is what would cause the problems)

7:37:18 hayley Sure, but we don't exactly have a memory model, and I think the Java memory model wants us to pick a winning write. But I haven't read the JMM properly, and I would guess replication copiers (like Sapphire) for Java have to handle this problem too.

7:38:32 moon-child (t1 could have a pointer both to the nursery copy of the object and to the global copy of the object. If t2 updates both in an unsynchronised fashion, t1 could observe the writes in either order, so it could observe a new state followed by the old state. I'm pretty sure the memory model has to allow this, in order to permit things like mem->register promotion)

7:39:21 moon-child hayley: well, there is bike's proposal. I suspect that's the most relevant thing here, even if it is not finished yet

7:39:47 moon-child 'pick a winning write' well--at _some_ point, you would synchronise; that would happen when you do a nursery collection (or if you have a fence or w/e)

7:40:28 moon-child and I'll add, value in nursery being different from value in global heap is very similar to value in one core's cache being different to the value in another core's cache. So I would _guess_ this would be ok

7:41:38 hayley One thing I need to have a think about is it would appear that writing to a replicated object and doing a compaction cannot occur concurrently, which suggests that each nursery requires a lock. In theory this is distressing, as nursery collections are no longer lock-free, but the critical section for updating a replica perhaps isn't too long.

7:43:00 moon-child hmm can't it?

7:43:20 moon-child ideally the nursery compactor would just fight with the other thread for the specific replicated object

7:43:31 moon-child rather than locking the whole nursery

7:43:36 moon-child which means you can move other objects while you wait

7:43:54 hayley Hm, it might suffice to disable replication for global objects just before compaction; I'd have to re-read beach's paper on the collection algorithm. But more sinister is a situation like T1 starts a DCAS, T2 starts compacting, T1 now updates into somewhere random, because it didn't know that the object is no longer being replicated.

7:44:19 hayley ACTION grumbles how many times she says "replicated object", when it has nothing to do with her published description of a "replicated object system".

7:45:22 moon-child oh hmm right

7:45:42 moon-child see, this is why I was dubious of letting threads peek into each other's nurseries :P

7:45:49 moon-child but presumably there is a backpointer

7:45:59 moon-child so the global replica is the 'canonical' one

7:46:15 hayley There is, yes.

7:46:16 moon-child t1 and t2 race to set a flag on that one, and whichever one wins gets to diddle or move it

7:46:38 moon-child if t2 loses, it leaves the object alone while it compacts the others

7:47:19 moon-child most objects are not replicated, so it's a free branch most of the time

7:47:33 hayley Okay, so the nursery collector has to lock every object before compacting? It's quite likely to win, if we disable replication just before, and it'd only lose the race if some other thread races which is unlikely.

7:51:14 moon-child hmm. So a couple more things: 1, maybe replicated objects should always be moved into the global heap following a minor gc (so at that point you just flush your write and then set a forwarding pointer); 2, alternately, can we demote a replicated object back to a nursery object?

7:51:42 hayley I think the current idea is to do #1 as we are trying to promote objects "lazily".

7:56:37 moon-child hmm, it can be done with a single atomic: reserve a lock bit in the pointer from the global replica to the local one. I don't know how much that helps. But I think the main important point is that if you lose the lock, you can keep doing work; don't have to wait for it

7:58:57 moon-child !

7:59:16 moon-child if the nursery is semispace rather than sliding, then you can just leave the old object there, and don't care if another thread reads a stale pointer

7:59:31 moon-child only have to synchronise with the _next_ minor collection, which should not be a problem

8:00:33 hayley Hm, now I don't know if that DCAS will be atomic with respects to ordinary reads. If we have to do something fancy to make reads synchronise properly, we may as well have some other sort of read barrier.

8:07:58 moon-child you are probably right. But I still suspect that whatever memory model is ultimately adopted will be weak enough that it doesn't have to be atomic

8:12:12 hayley Perhaps so. But then, how do we have to behave around a fence? I was hoping to have atomic updates, to avoid the hard problems.

8:13:01 moon-child fence could check if any updates have been performed on objects within this nursery

8:13:29 moon-child maybe that's too expensive? hmm

8:13:51 hayley Have I complained about the lack of transactoinal memory enough today?

8:13:58 moon-child :D

8:14:01 hayley *transactional

8:14:15 moon-child spitballing--maybe fencing can increment some generation number, and if another thread wants to mutate a replicated object and its generation number is too old, it has to do extra work?

8:14:46 hayley Though, I could not say what to do if the micro-architecture hates us, or we otherwise keep failing to commit.

8:17:26 hayley We could predict that, based on how caches and transactional memory tend to work, this scheme won't spuriously fail, but there is no guarantee in the architecture specification, I think.

8:19:44 beach So we don't have hardware transactional memory, but do we have speculative lock elision? If so, locks might be very inexpensive.

8:20:57 hayley Perhaps in our dreams. Intel had speculative lock elision, but I think they had to disable it due to bugs.

8:21:09 beach Oh! :(

8:21:50 beach But do we know whether it will be activated again in the future?

8:22:30 hayley However, locking wouldn't help in this situation either. The plan I had was to have some sort of double-CAS, such that unsynchronised reads would observe either the old or the new value in both replicas, without the possibility to read the old in one replica, and the new in another replica.

8:22:47 beach Yes, I see.

8:23:59 hayley Wikipedia claims there will be a "TSXLDTRK" instruction set for Xeon processors set to be released in the second half of this year. Of course, the issue is that it will be a feature you only get on server CPUs, which will not be useful for many users.

8:24:46 beach Hmm, OK.

8:26:01 moon-child why don't they put the software people in charge of hardware design

8:26:06 hayley I believe someone once said "PC manufacturers cannot invent acronyms."

8:26:07 moon-child :/

8:27:04 moon-child in fairness, computer scientists (like physicists) are known for being absolutely awful at naming things

8:39:12 hayley It would appear Sapphire does unsynchronised writes. It also has to replicate writes of unboxed slots, too.

8:53:58 hayley At the least, we've established that fixing up pointers incrementally is quite similar to replication copying. And the description of Sapphire in the paper already assumes there can be a part of the heap we don't want to copy, and that the whole world doesn't need to be stopped for phase changes, so hopefully we don't have to invent too much now.

8:55:03 hayley The paper claims "Programs with data races may exhibit new behaviors under Sapphire, but the effects are similar to those allowed when optimizing ([12, p. 378]). Further, the values stored into any field are ones stored by some thread, so Sapphire cannot violate Java type-safety. In any case, for properly synchronized programs, Sapphire produces results consistent with the Java Virtual Machine Specification." So I think we are safe in that regard,

8:55:03 hayley too.

9:18:04 hayley Hm, the...other paper by Hudson and Moss claims that volatile slots in Java are handled by ensuring that the operations are performed on the newspace replica, if one exists. This is a bit like a read barrier, though it might not appear in generated code many times, if we only have to emit it near barriers.

9:19:13 moon-child what is 'near'?

9:20:19 hayley Good question. I'm only certain that we need the read barrier when a fence is performed.

9:21:35 moon-child I don't disagree, but you don't know, at any given point, if a fence was in fact performed

9:21:49 moon-child so don't you need to barrier in case it was?

9:22:02 hayley Right.

9:22:58 moon-child hmmm. Maybe compile two versions of every function: one with read barriers, and one without. After you perform a fence, you osr to the read-barrier code. When you do a minor gc, you switch back

9:23:28 moon-child if doing a lot of fences, you should stay in the read-barrier code pretty much all the time (and maybe the minor gc has some heuristic to not switch back then), so you don't pay so many return mispredicts

9:25:18 hayley I probably should see what the Sapphire people did. The paper only mentions non-volatile and volatile fields in Java, and nothing on other sorts of atomic access.

9:25:29 moon-child ACTION nods

9:33:09 hayley Maybe the issue could be avoided by making sure a mutator flips its roots to newspace, before fencing. But that doesn't help us at all, as we are trying very hard to not flip roots.

9:57:16 hayley My wild guess at the moment is that the read barrier is only necessary when performing atomic updates, but I have no idea.

10:01:09 hayley But then something like (let ((x 'old)) (atomically-setf x 'new) x) could return either OLD or NEW in the resulting memory model, which is pretty weird.

10:20:43 hayley ...no it couldn't, what am I saying? One "issue" occurs when a replicated write occurs concurrently with two unsynchronised reads to either replica, and the reads read both the old and new value. But this is fine, as the write can be observed to happen in between reads.

10:24:57 hayley Another issue occurs when two threads try to perform replicated writes concurrently, in which case the replicas can remain out of sync indefinitely. I think we could get away with locking, if the critical section is only two writes, and the locks have fine granularity (say, one lock bit per 64 bytes, for arbitrary values of 64). A similar thing applies for atomic updates (though, if the user wants lock-free updates, it might be nice to give them

10:24:57 hayley lock-free updates). So I'm starting to think we'll be fine.

10:29:13 hayley And, of course, if we spend too much time locking while performing replicated writes, we can cut our losses and just do a minor GC/promotion. Though the first issue might be a real issue, if we observe the new value, and then the old value again, by reading replicas in the opposite order to which they were written. The memory model would likely allow reordering, but not across a fence, which is the real problem here.

10:49:38 hayley I have a vague idea, but so far there isn't a chance this idea can be done efficiently. It might suffice for a fence to ensure that all locks have been released, i.e. there are no replicated writes occuring concurrently with the fence. Therefore it is not possible for reads occurring after the fence to miss any writes which occurred before the fence, I think.

10:50:45 hayley Though, again, replication copying is hopefully uncommon and required of very few objects, so a fence might just require iterating over a table of replicated objects, if there are any replicated objects to deal with.

11:09:39 hayley That would look like thread-local heaps for mutable objects, without having to fix every root when the write barrier is tripped. We only had to invent our own replica coherency protocol in software.

12:10:55 hayley Perhaps this scheme is too slow in practise; I have no idea and would have to test. Speculative lock elision could be useful, if we get a working implementation in the rest of this century. And we can still decide to avoid local allocation or stomach a minor GC if the time overhead of replication is intolerable.

13:24:46 zak1 ** NICK z4kz

13:28:32 scymtym beach: i have already written down some requirements for the environment library. i can try to make those available but i'm not sure whether i will manage to do it today

13:29:22 beach Great! No rush. I decided Clostrum is good enough for me to continue working on bootstrapping, so I won't work on it for a while.

15:05:36 contrapunctus scymtym: «I've been using `eclector.parse-result:read` to read forms in files; but I can't figure out how to get skipped input (specifically comments), or even its location. I've been trying out different things based on the manual (defining a `make-skipped-input-result` method, trying to access the second return value of `read`), but to no avail...»

15:20:39 scymtym contrapunctus: https://techfak.de/~jmoringe/wad.lisp shows how to do that (the example was a test for second climacs originally). you should probably change the body of the MAKE-SKIPPED-INPUT-RESULT method to :reason reason :children '(). you can remove the code using utilities.print-tree or quickload that system, i think. the extra return value you mentioned is relevant for "top-level" skipped input since the corresponding results

15:20:39 scymtym cannot be children of expression results