freenode/#sicl - IRC Chatlog
Search
4:16:16
Bike
unfortunately, after a while this results in diagrams like these https://cdn.discordapp.com/attachments/390697866055647242/803840415941001236/c11.png
4:18:36
Bike
They're diagrams of possible executions (i.e. happens-before total orders) for simple C programs. I think the black lines are happens-before, and the colored lines relate to orderings of reads and writes that you have to deal with if you use atomics.
4:19:16
Bike
Incidentally, apparently hans "garbage collector" boehm is heavily involved in the C++ standards discussions of this stuff. that's kind of interesting
4:21:59
Bike
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r4.html or at least here he is. some of this stuff is still an open research area, at least as relates to like, formal verification.
4:24:44
Bike
actually in another tab i have a paper open that mentions interaction between this stuff and garbage collectors
4:25:20
Bike
"[...] a similar capability will likely be needed by any C++ garbage collector. Furthermore, garbage collectors enable use cases that are quite similar to those of [read copy update]"
4:25:57
Bike
But I will be interested to see if they're concerns for garbage collection more generally. In multithreaded environments anyway.
4:27:44
Bike
Most of the complicated stuff comes in when you give up sequential consistency, which apparently really does help performance sometimes. But probably if the main concern is correctness one can just stick with sequential consistency.
4:30:15
beach
So, in summary, in order to make C++ as fast as possible while still having a well-defined memory model, they need to make that memory model extremely complicated. At the same time, their exception system is broken, so can't really be used by application code, and the fact that they still stipulate manual memory management means that they can't write programs that are both modular and fast.
4:32:01
beach
Oh, and at the same time, the world's most popular programming language is Python, which is at least an order of magnitude slower than any naive Common Lisp implementation.
4:33:17
beach
So the only conclusion I can see is that they are doing all this work for game developers, who, as we know, don't care about modularity as long as they can save a few nanoseconds here and there.
4:34:35
Bike
my impression is that the memory model stuff is actually more closely linked to OS development. and system stuff like garbage collectors, apparently.
4:35:07
Bike
I guess because operating systems people are the one writing libraries to implement locks and stuff, which is one thing you want to be fast where sequential consistency isn't necessary.
4:35:37
Bike
some of the sbcl source about memory barriers actually just recommends reading a text file in the linux kernel docs
4:36:34
beach
That sounds plausible, actually. But I suspect they exaggerate the need for speed for both those use cases.
4:37:45
beach
I mean, take a modern OS. They spend all this time optimizing memory access and such, but they are stuck with expensive context switches that make it impossible to use modern I/O devices in the normal way.
4:40:23
beach
For example, I can't get a simple music-synth program to work on my Linux system with any reasonable delay, without compiling a special version of the kernel.
4:40:58
Bike
i haven't done enough programming with atomics to have much of a very educated opinion about speed. i'm hoping if lisp can get some definition here maybe i can.
4:41:44
Bike
the reason i started in on this again is that i realized the caching in ctype could be written in a lock free way pretty easily, but i can't actually express some of what i need for that
4:42:20
Bike
it's certainly possible boehm and so on are wrong, but i definitely don't know enough to convince anybody of that
4:42:43
beach
Bike: All I am saying is that we (i.e., you in this case) can probably afford to be more conservative with the memory model and still have good overall performance, due to other qualities of Common Lisp.
4:43:04
aeth
beach: That isn't true about game developers anymore. It was true 10-30 years ago back when they would ship a release and never patch it, leaving bugs and all. But these days, they might support the same game for 10-20 years so it actually needs to have readable/maintainable code.
4:43:07
Bike
yeah. i mean, it's not like i'm including the whole C++ memory model here, just stuff that seems like it could be useful.
4:43:25
aeth
beach: The "this doesn't need to be maintained and it does need to be a few nanoseconds faster" of today seems to be scientific computing
4:43:32
Bike
i'd also like to put in safety guarantees java (the only other language with a memory model, practically) has that C++ doesn't
4:44:03
beach
aeth: That's an interesting evolution. Thanks for the information. Though I still hear about "data-oriented programming" which seems to be the opposite of modulariry.
4:44:45
aeth
Yeah, some gamedevs think that they're still living in the past because e.g. Id in the days of Doom is still very romanticized. But a ton of gamedev is just done in C# or even Lua these days.
4:45:43
aeth
And most of the rest is stuck on C++ because of lock-in, e.g. the middleware they use is C++ and good luck interoperating with C++ with something other than C++
4:47:07
beach
Bike: Yes, I agree. I am just saying that there is probably no need so squeeze out every nanosecond. We have other advantages over languages such as C++.
4:48:21
Bike
the sbcl subtypep cache is very aggressively optimized and deals with some of this stuff. it would be interesting to know what particularly motivated that. i mean, subtypep is important if you're doing a lot of type inference stuff, of course
4:48:52
Bike
(it is, unfortunately, also kind of unreadable, but for lispy macro reasons rather than because of memory model concerns)
4:51:49
beach
I suspect we are observing the result of a maintenance model where (according to scymtym) each individual pretty much decides what is worthwhile, combined with a long history of evolution of the code.
4:53:17
beach
For SICL, I would much rather emphasize general mechanisms such as the call-site optimizatino I came up with, the fast generic dispatch, path replication, etc., than attempting to individually optimize every other part of the code.
4:56:36
beach
Let me take just one example. I decided to stick with the indirection imposed by standard objects, even for objects like functions simple arrays, etc. It would have been tempting to optimize those cases by introducing header-less objects, thereby complicating the code a lot.
4:56:37
beach
But with general mechanisms (like call-site optimization) and others yet to be invented, we can keep the code simple and still get better performance that what would have been possible with those tempting optimizations.
4:58:30
beach
Another example. The indirection through the header makes the rack internally consistent. So it seems a lot of synchronization can be avoided by forcing the reallocation of a new rack and the use of CAS.
4:58:57
beach
Without the indirection, locks would be the only option in many cases, as far as I can tell.
5:00:07
beach
I don't even know how SBCL handles AREF. I mean, it has to verify the size and then access the element. What if there is a call to ADJUST-ARRAY in the middle?
5:07:48
Bike
that's good to hear. i'm not sure about that one. the simple/compound thing feels a bit too elaborate
5:08:32
beach
I am not sure it is the style that a standards document should have, but the explanation is good.
5:13:02
beach
And the "Acquisition and release section" seems to totally answer my question from yesterday (my time).
5:18:11
beach
Bike: Oh, and I am often a good proofreader for this kind of stuff, because of my apparent inability to understand prose that is not extremely precise.
5:22:39
Bike
that's flattering, but i won't be convinced i understand this stuff until i've had to debug programs using it :)
8:23:10
beach
Bike: "somehow knows that TRANSFORM does have access to the vector". Should that be "somehow knows that TRANSFORM does not have access to the vector"?
8:24:36
beach
Bike: "since now and VECTOR can be discarded" . Should that be "since now VECTOR can be discarded" ?
8:37:33
beach
Bike: "This last must be an atomic ..." -> "This last operation must be an atomic..." maybe?
9:45:47
beach
This has been a very strange day so far. I have been scrutinizing the excellent documentation Bike wrote, I have been writing a long email to my banker saying essentially that I don't appreciate being "f*cked with" (I won't give the details here), and I have been filling out forms for the annual general assembly of the house-owner's association, which will take place by email (or snail mail for those who are computer illiterate).
10:13:15
heisig
I am not sure whether there is such a thing as a 'conservative, but not terribly slow' memory model that we could use for SICL.
10:14:10
heisig
For one, we definitely should use every trick available for implementing the GC, and for implementing primitive data structures like locks.
10:14:47
heisig
Also, many choices boil down to the question whether we allow a CPU to perform out-of-order execution or not.
10:15:24
heisig
And whether writes to memory should be buffered, or written to memory immediately. The latter can be very costly.
10:15:37
beach
I am pretty sure there could be such a thing. I mean, most of the time, memory models and other types of performance details are viewed in isolation. I am totally willing to simplify the memory model to simplify the code, and instead insist on performance elsewhere.
10:17:21
heisig
But the only significant simplification I see would be to treat every single shared variable in SICL as volatile (with Java's semantics for volatile).
10:18:24
beach
And I think that would be fine, now that there are not many shared variables left, and most of them are probably read-only anyway.
10:22:46
heisig
I just ran a benchmark. Adding a memory barrier to a write instruction adds 24 nanoseconds of overhead on my machine.
10:23:12
heisig
So for writes to the L1 cache (which are frequent) we are talking about an order of magnitude slowdown.
10:23:46
heisig
Not that I advise against the idea (I am intrigued, actually), but we should be aware that it is very costly.
10:25:06
beach
I firmly believe that a lot of code is made worse by incorrect hunches on the part of maintainers.
10:25:53
no-defun-allowed
What do adding the barriers give you? I'm still not up to scratch on concurrency stuff.
10:28:13
heisig
Exactly. In my case, I used a data structure that is only correct if several slots of a struct are read in the correct order.
10:30:38
no-defun-allowed
How tricky is it to explicitly add barriers? Bike's document suggests barriers are only required in "only a few places".
10:31:45
heisig
Most barriers are just a single CPU instruction. Or even a NOP, in case the CPU enforces certain guarantees anyway.
10:32:06
no-defun-allowed
ACTION may have had a knee-jerk reaction to 24 nanoseconds; she was throwing fancy x86 instructions at hash table probing to save perhaps fewer nanoseconds.
10:34:12
no-defun-allowed
I mostly meant if it affects programming style, but I can see that it requires a compiler to be more careful with transformations.
11:55:15
heisig
Dammit that is the same justification that the arms industry uses. Maybe I should just shut up.
13:51:33
Bike
i figure most people would stick with locks, and then of the fewer people using atomics most would use sequential consistency (which is the default), and then a few Real Hackers like heisig can use the other stuff
13:52:03
Bike
and in the meantime the implementation makes as much as possible unordered even without the programmer saying so, so that if they make a mistake they just get weird results instead of crashing
13:55:40
Bike
the C++ thing i'm leaving out is memory_order_consume, which as far as i can tell has been in the C++ standard for ten years without anyone implementing it properly or even understanding how it works
14:03:32
Bike
but yeah, i don't know, they're definitely partly involved. one of these papers has torvalds as a contributor
15:24:56
Bike
ok, next question. SBCL defines an operator atomic-update that uses CAS to perform an arbitrary update on a place, i.e. it does an atomic read-modify-write with some function and arguments you pass in. it does that with a CAS loop. The question is: in sbcl's implementation, the update function form and the extra arguments can be evaluated multiple times. is that sensible?
15:25:12
Bike
like, you can do atomic-push (though that has its own definition) as (atomic-update place #'cons t)
15:25:33
Bike
and the #'cons and t forms might be evaluated multiple times. this is apparently intentional but i'm not sure i understand why
15:27:03
beach
Sounds like a question for karlosz or scymtym. Or perhaps you are just thinking out loud?
15:28:06
Bike
well, kind of. it would be nice if i could see some uses of atomic-update to understand what makes more sense with how it's used, but there aren't a lot of uses around. but maybe somebody here has ideas