freenode/#sicl - IRC Chatlog

8:01:07 shka_ hello zacts

8:15:14 beach Bike: "not alone are many writes and reads atomic", should that be "not ONLY"?

8:18:43 beach The section "Atomic read-modify-write operations" is otherwise fine.

8:23:10 beach Bike: "somehow knows that TRANSFORM does have access to the vector". Should that be "somehow knows that TRANSFORM does not have access to the vector"?

8:24:36 beach Bike: "since now and VECTOR can be discarded" . Should that be "since now VECTOR can be discarded" ?

8:28:11 beach Bike: "the main thread things" -> "the main thread thinks"

8:33:26 beach Bike: Otherwise, the section "Sequential inconsistency" is fine too.

8:37:33 beach Bike: "This last must be an atomic ..." -> "This last operation must be an atomic..." maybe?

8:38:56 beach Bike: "Releasing the lock - the write of the flag as on - ..." should that be "off"?

8:58:36 beach I'll read the last three sections later. I may need to be reminded.

9:36:10 heisig Good morning!

9:39:49 no-defun-allowed Good morning heisig!

9:39:50 ebrasca Morning heisig!

9:42:29 beach Hello heisig.

9:45:47 beach This has been a very strange day so far. I have been scrutinizing the excellent documentation Bike wrote, I have been writing a long email to my banker saying essentially that I don't appreciate being "f*cked with" (I won't give the details here), and I have been filling out forms for the annual general assembly of the house-owner's association, which will take place by email (or snail mail for those who are computer illiterate).

10:08:30 heisig I agree, Bike's documentation is excellent.

10:09:32 beach Yes, he should definitely consider writing books.

10:13:15 heisig I am not sure whether there is such a thing as a 'conservative, but not terribly slow' memory model that we could use for SICL.

10:14:10 heisig For one, we definitely should use every trick available for implementing the GC, and for implementing primitive data structures like locks.

10:14:25 beach Sure.

10:14:47 heisig Also, many choices boil down to the question whether we allow a CPU to perform out-of-order execution or not.

10:15:08 beach Indeed.

10:15:24 heisig And whether writes to memory should be buffered, or written to memory immediately. The latter can be very costly.

10:15:37 beach I am pretty sure there could be such a thing. I mean, most of the time, memory models and other types of performance details are viewed in isolation. I am totally willing to simplify the memory model to simplify the code, and instead insist on performance elsewhere.

10:16:29 beach Where "elsewhere" is parallel GC, call-site optimization, CAS for rack updates. etc.

10:17:21 heisig But the only significant simplification I see would be to treat every single shared variable in SICL as volatile (with Java's semantics for volatile).

10:18:24 beach And I think that would be fine, now that there are not many shared variables left, and most of them are probably read-only anyway.

10:18:37 beach Assuming you mean shared as in closed-over.

10:20:42 heisig Worse. Everything on the global heap.

10:22:06 beach I would think the rack update thing would help that.

10:22:46 heisig I just ran a benchmark. Adding a memory barrier to a write instruction adds 24 nanoseconds of overhead on my machine.

10:23:12 heisig So for writes to the L1 cache (which are frequent) we are talking about an order of magnitude slowdown.

10:23:26 beach I can believe that.

10:23:46 heisig Not that I advise against the idea (I am intrigued, actually), but we should be aware that it is very costly.

10:23:55 beach Definitely.

10:24:04 heisig Unless we come up with another magic trick :)

10:24:27 beach And we should not act until we have metrics to show where time is spent.

10:25:06 beach I firmly believe that a lot of code is made worse by incorrect hunches on the part of maintainers.

10:25:53 no-defun-allowed What do adding the barriers give you? I'm still not up to scratch on concurrency stuff.

10:26:05 no-defun-allowed do → does?

10:26:43 heisig no-defun-allowed: Barriers prevent the CPU from reordering certain instructions.

10:27:10 heisig A CPU makes sure that you cannot observe such reordering within one thread.

10:27:20 heisig But other threads can spot the difference.

10:27:34 heisig Such a bug recently bit me hard, so now I know about this.

10:27:51 no-defun-allowed I see, I recall that from your work-stealing queue.

10:28:13 heisig Exactly. In my case, I used a data structure that is only correct if several slots of a struct are read in the correct order.

10:28:52 heisig So I had to insert the right kind of barriers.

10:30:38 no-defun-allowed How tricky is it to explicitly add barriers? Bike's document suggests barriers are only required in "only a few places".

10:31:45 heisig Most barriers are just a single CPU instruction. Or even a NOP, in case the CPU enforces certain guarantees anyway.

10:32:02 heisig But one has to make sure that the compiler respects that barrier, too.

10:32:06 no-defun-allowed ACTION may have had a knee-jerk reaction to 24 nanoseconds; she was throwing fancy x86 instructions at hash table probing to save perhaps fewer nanoseconds.

10:34:09 heisig For me, 24 nanoseconds is a lot. I am one of the HPC guys that Bike warned about :)

10:34:12 no-defun-allowed I mostly meant if it affects programming style, but I can see that it requires a compiler to be more careful with transformations.

10:48:06 jackdaniel ACTION runs from heisig screaming

11:01:03 shka_ the horror

11:54:08 heisig Now I feel like a monster. Sorry guys, but someone has to do that stuff, too.

11:55:15 heisig Dammit that is the same justification that the arms industry uses. Maybe I should just shut up.

11:56:18 jackdaniel if it finishes before my coffee is ready I'm fine

11:58:03 frgo_ ** NICK frgo

13:46:20 Bike thank you for actuall providing some numbers, heisig

13:46:23 Bike actually*

13:51:33 Bike i figure most people would stick with locks, and then of the fewer people using atomics most would use sequential consistency (which is the default), and then a few Real Hackers like heisig can use the other stuff

13:52:03 Bike and in the meantime the implementation makes as much as possible unordered even without the programmer saying so, so that if they make a mistake they just get weird results instead of crashing

13:53:19 jackdaniel that's why I've run with screaming, hackers are criminals aren't they? ,)

13:55:40 Bike the C++ thing i'm leaving out is memory_order_consume, which as far as i can tell has been in the C++ standard for ten years without anyone implementing it properly or even understanding how it works

14:02:15 Bike i think the linux kernel people wanted it but they didn't specify it very well

14:02:36 shka_ linux kernel people don't even use C++

14:02:41 shka_ ?

14:03:12 Bike it's in C11 as well.

14:03:24 shka_ ok

14:03:32 Bike but yeah, i don't know, they're definitely partly involved. one of these papers has torvalds as a contributor

14:03:57 Bike http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0462r1.pdf

14:04:01 shka_ "c++ is shit, i know because i contributed to make it shit"

14:04:19 shka_ weird stuff

14:08:30 Bike thank you for pointing out those typos, beach

14:10:16 beach Sure.

15:24:56 Bike ok, next question. SBCL defines an operator atomic-update that uses CAS to perform an arbitrary update on a place, i.e. it does an atomic read-modify-write with some function and arguments you pass in. it does that with a CAS loop. The question is: in sbcl's implementation, the update function form and the extra arguments can be evaluated multiple times. is that sensible?

15:25:12 Bike like, you can do atomic-push (though that has its own definition) as (atomic-update place #'cons t)

15:25:33 Bike and the #'cons and t forms might be evaluated multiple times. this is apparently intentional but i'm not sure i understand why

15:25:49 Bike er, (atomic-push object list) = (atomic-update place #'cons object), i mean

15:26:42 Bike subforms of the place are only evaluated once, though... hm...

15:27:03 beach Sounds like a question for karlosz or scymtym. Or perhaps you are just thinking out loud?

15:28:06 Bike well, kind of. it would be nice if i could see some uses of atomic-update to understand what makes more sense with how it's used, but there aren't a lot of uses around. but maybe somebody here has ideas

17:31:31 Bike a paper drmeister linked in #clasp a few days ago might be interesting. http://users.cecs.anu.edu.au/~steveb/pubs/papers/yieldpoint-ismm-2015.pdf it's a fairly detailed performance analysis of "yieldpoints", meaning the compiler inserting a go-to-GC (or go-to-profile, or etc) check on every loop backedge and function call. i'm not sure if SICL uses something like that for GC, but it does remind me of the debug

17:31:37 Bike bit.

17:32:13 beach I plan to have "safe points" that seem to be the same thing.

17:32:26 Bike yeah, they're called a bunch of things.

17:33:08 Bike among other things they find that the overhead is pretty low, and some specific details of the distribution, e.g. in all their benchmarks they found that a small subset of safepoints accounted for almost all safepoint checks dynamically.

17:33:34 Bike er, that's kind of confusingly phrased. I mean, only a small subset of the generated safepoints.

17:35:11 beach So are they saying that most of them can be eliminated? I don't see how that could be possible.

17:36:01 Bike i don't think they make any recommendation, but it makes sense to me. i mean, only a small subset of code is "hot", right? probably a safepoint in something like CONS is executed very frequently

17:36:24 beach Yes, very likely.

17:36:45 Bike "For example, in places where the compiler makesan assumption regarding type specialization or an inliningdecision, it inserts a group-based yieldpoint before the spe-cialized or inlined code. Whenever the run-time system no-tices that the assumption breaks, it enables that group ofyieldpoints to prohibit further execution of the code un-der false assumption. Code that reaches the enabled

17:36:51 Bike yield-points will take a slow-path, where the run-time compiler can make amends and generate new valid code" seems relevant to call site optimization

17:36:54 Bike ugh, pasting from pdfs

17:38:13 Bike ah, here we go. they do suggest that the safepoints that are executed extremely commonly could use a more performant mechanism, like patching the code (so if the safepoint isn't active there's just a nop, and if it is there's a branch)

17:38:34 Bike which would be impracticably slow if you did it for every safe point since patching code ruins caches and stuff

17:38:35 beach Ah, that makes sense, sure.

17:51:12 beach My (admittedly small) family just announced that dinner is served. I'll be back tomorrow as usual.

19:02:02 aeth_ ** NICK aeth