freenode/#sicl - IRC Chatlog
Search
8:23:10
beach
Bike: "somehow knows that TRANSFORM does have access to the vector". Should that be "somehow knows that TRANSFORM does not have access to the vector"?
8:24:36
beach
Bike: "since now and VECTOR can be discarded" . Should that be "since now VECTOR can be discarded" ?
8:37:33
beach
Bike: "This last must be an atomic ..." -> "This last operation must be an atomic..." maybe?
9:45:47
beach
This has been a very strange day so far. I have been scrutinizing the excellent documentation Bike wrote, I have been writing a long email to my banker saying essentially that I don't appreciate being "f*cked with" (I won't give the details here), and I have been filling out forms for the annual general assembly of the house-owner's association, which will take place by email (or snail mail for those who are computer illiterate).
10:13:15
heisig
I am not sure whether there is such a thing as a 'conservative, but not terribly slow' memory model that we could use for SICL.
10:14:10
heisig
For one, we definitely should use every trick available for implementing the GC, and for implementing primitive data structures like locks.
10:14:47
heisig
Also, many choices boil down to the question whether we allow a CPU to perform out-of-order execution or not.
10:15:24
heisig
And whether writes to memory should be buffered, or written to memory immediately. The latter can be very costly.
10:15:37
beach
I am pretty sure there could be such a thing. I mean, most of the time, memory models and other types of performance details are viewed in isolation. I am totally willing to simplify the memory model to simplify the code, and instead insist on performance elsewhere.
10:17:21
heisig
But the only significant simplification I see would be to treat every single shared variable in SICL as volatile (with Java's semantics for volatile).
10:18:24
beach
And I think that would be fine, now that there are not many shared variables left, and most of them are probably read-only anyway.
10:22:46
heisig
I just ran a benchmark. Adding a memory barrier to a write instruction adds 24 nanoseconds of overhead on my machine.
10:23:12
heisig
So for writes to the L1 cache (which are frequent) we are talking about an order of magnitude slowdown.
10:23:46
heisig
Not that I advise against the idea (I am intrigued, actually), but we should be aware that it is very costly.
10:25:06
beach
I firmly believe that a lot of code is made worse by incorrect hunches on the part of maintainers.
10:25:53
no-defun-allowed
What do adding the barriers give you? I'm still not up to scratch on concurrency stuff.
10:28:13
heisig
Exactly. In my case, I used a data structure that is only correct if several slots of a struct are read in the correct order.
10:30:38
no-defun-allowed
How tricky is it to explicitly add barriers? Bike's document suggests barriers are only required in "only a few places".
10:31:45
heisig
Most barriers are just a single CPU instruction. Or even a NOP, in case the CPU enforces certain guarantees anyway.
10:32:06
no-defun-allowed
ACTION may have had a knee-jerk reaction to 24 nanoseconds; she was throwing fancy x86 instructions at hash table probing to save perhaps fewer nanoseconds.
10:34:12
no-defun-allowed
I mostly meant if it affects programming style, but I can see that it requires a compiler to be more careful with transformations.
11:55:15
heisig
Dammit that is the same justification that the arms industry uses. Maybe I should just shut up.
13:51:33
Bike
i figure most people would stick with locks, and then of the fewer people using atomics most would use sequential consistency (which is the default), and then a few Real Hackers like heisig can use the other stuff
13:52:03
Bike
and in the meantime the implementation makes as much as possible unordered even without the programmer saying so, so that if they make a mistake they just get weird results instead of crashing
13:55:40
Bike
the C++ thing i'm leaving out is memory_order_consume, which as far as i can tell has been in the C++ standard for ten years without anyone implementing it properly or even understanding how it works
14:03:32
Bike
but yeah, i don't know, they're definitely partly involved. one of these papers has torvalds as a contributor
15:24:56
Bike
ok, next question. SBCL defines an operator atomic-update that uses CAS to perform an arbitrary update on a place, i.e. it does an atomic read-modify-write with some function and arguments you pass in. it does that with a CAS loop. The question is: in sbcl's implementation, the update function form and the extra arguments can be evaluated multiple times. is that sensible?
15:25:12
Bike
like, you can do atomic-push (though that has its own definition) as (atomic-update place #'cons t)
15:25:33
Bike
and the #'cons and t forms might be evaluated multiple times. this is apparently intentional but i'm not sure i understand why
15:27:03
beach
Sounds like a question for karlosz or scymtym. Or perhaps you are just thinking out loud?
15:28:06
Bike
well, kind of. it would be nice if i could see some uses of atomic-update to understand what makes more sense with how it's used, but there aren't a lot of uses around. but maybe somebody here has ideas
17:31:31
Bike
a paper drmeister linked in #clasp a few days ago might be interesting. http://users.cecs.anu.edu.au/~steveb/pubs/papers/yieldpoint-ismm-2015.pdf it's a fairly detailed performance analysis of "yieldpoints", meaning the compiler inserting a go-to-GC (or go-to-profile, or etc) check on every loop backedge and function call. i'm not sure if SICL uses something like that for GC, but it does remind me of the debug
17:33:08
Bike
among other things they find that the overhead is pretty low, and some specific details of the distribution, e.g. in all their benchmarks they found that a small subset of safepoints accounted for almost all safepoint checks dynamically.
17:33:34
Bike
er, that's kind of confusingly phrased. I mean, only a small subset of the generated safepoints.
17:35:11
beach
So are they saying that most of them can be eliminated? I don't see how that could be possible.
17:36:01
Bike
i don't think they make any recommendation, but it makes sense to me. i mean, only a small subset of code is "hot", right? probably a safepoint in something like CONS is executed very frequently
17:36:45
Bike
"For example, in places where the compiler makesan assumption regarding type specialization or an inliningdecision, it inserts a group-based yieldpoint before the spe-cialized or inlined code. Whenever the run-time system no-tices that the assumption breaks, it enables that group ofyieldpoints to prohibit further execution of the code un-der false assumption. Code that reaches the enabled
17:36:51
Bike
yield-points will take a slow-path, where the run-time compiler can make amends and generate new valid code" seems relevant to call site optimization
17:38:13
Bike
ah, here we go. they do suggest that the safepoints that are executed extremely commonly could use a more performant mechanism, like patching the code (so if the safepoint isn't active there's just a nop, and if it is there's a branch)
17:38:34
Bike
which would be impracticably slow if you did it for every safe point since patching code ruins caches and stuff
17:51:12
beach
My (admittedly small) family just announced that dinner is served. I'll be back tomorrow as usual.