freenode/#sicl - IRC Chatlog

4:01:35 beach Good morning everyone!

4:01:58 no-defun-allowed Good morning beach!

4:05:54 beach Bike: Wow, it looks like you did a lot of work!

4:06:20 Bike yeah, i'm not good at deciding to stop explaining things

4:06:57 beach Heh, that's a funny way of putting it.

4:10:49 beach Bike: Are you employed by drmeister's company now?

4:11:18 Bike not yet. maybe soon. it's almost got customers.

4:11:27 beach Oh, nice!

4:13:58 beach The "Concurrency" section is quite nice!

4:16:07 Bike yeah, i figure if you explain the partial order stuff it goes easier

4:16:16 Bike unfortunately, after a while this results in diagrams like these https://cdn.discordapp.com/attachments/390697866055647242/803840415941001236/c11.png

4:17:13 beach Ouch! What are those?

4:18:36 Bike They're diagrams of possible executions (i.e. happens-before total orders) for simple C programs. I think the black lines are happens-before, and the colored lines relate to orderings of reads and writes that you have to deal with if you use atomics.

4:19:16 Bike Incidentally, apparently hans "garbage collector" boehm is heavily involved in the C++ standards discussions of this stuff. that's kind of interesting

4:20:01 beach Yes, it is.

4:21:59 Bike http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r4.html or at least here he is. some of this stuff is still an open research area, at least as relates to like, formal verification.

4:23:29 Bike oh, and there's doug lea too.

4:24:02 beach Very interesting.

4:24:44 Bike actually in another tab i have a paper open that mentions interaction between this stuff and garbage collectors

4:25:19 beach Not for C++ then?

4:25:20 Bike "[...] a similar capability will likely be needed by any C++ garbage collector. Furthermore, garbage collectors enable use cases that are quite similar to those of [read copy update]"

4:25:26 beach Oh!

4:25:26 Bike well, again, boehm.

4:25:44 beach Of course.

4:25:57 Bike But I will be interested to see if they're concerns for garbage collection more generally. In multithreaded environments anyway.

4:27:44 Bike Most of the complicated stuff comes in when you give up sequential consistency, which apparently really does help performance sometimes. But probably if the main concern is correctness one can just stick with sequential consistency.

4:30:15 beach So, in summary, in order to make C++ as fast as possible while still having a well-defined memory model, they need to make that memory model extremely complicated. At the same time, their exception system is broken, so can't really be used by application code, and the fact that they still stipulate manual memory management means that they can't write programs that are both modular and fast.

4:30:27 beach It seems like a host of wrong choices to me.

4:32:01 beach Oh, and at the same time, the world's most popular programming language is Python, which is at least an order of magnitude slower than any naive Common Lisp implementation.

4:33:17 beach So the only conclusion I can see is that they are doing all this work for game developers, who, as we know, don't care about modularity as long as they can save a few nanoseconds here and there.

4:34:35 Bike my impression is that the memory model stuff is actually more closely linked to OS development. and system stuff like garbage collectors, apparently.

4:35:07 Bike I guess because operating systems people are the one writing libraries to implement locks and stuff, which is one thing you want to be fast where sequential consistency isn't necessary.

4:35:09 beach Let me digest that information...

4:35:37 Bike some of the sbcl source about memory barriers actually just recommends reading a text file in the linux kernel docs

4:36:34 beach That sounds plausible, actually. But I suspect they exaggerate the need for speed for both those use cases.

4:37:45 beach I mean, take a modern OS. They spend all this time optimizing memory access and such, but they are stuck with expensive context switches that make it impossible to use modern I/O devices in the normal way.

4:38:22 beach I see a parallel between OS design and programming-language design.

4:40:23 beach For example, I can't get a simple music-synth program to work on my Linux system with any reasonable delay, without compiling a special version of the kernel.

4:40:58 Bike i haven't done enough programming with atomics to have much of a very educated opinion about speed. i'm hoping if lisp can get some definition here maybe i can.

4:41:16 beach Sure.

4:41:44 Bike the reason i started in on this again is that i realized the caching in ctype could be written in a lock free way pretty easily, but i can't actually express some of what i need for that

4:42:20 Bike it's certainly possible boehm and so on are wrong, but i definitely don't know enough to convince anybody of that

4:42:23 pjb beach: perhaps you need to compile the linux kernel with the real-time option.

4:42:43 beach Bike: All I am saying is that we (i.e., you in this case) can probably afford to be more conservative with the memory model and still have good overall performance, due to other qualities of Common Lisp.

4:43:04 aeth beach: That isn't true about game developers anymore. It was true 10-30 years ago back when they would ship a release and never patch it, leaving bugs and all. But these days, they might support the same game for 10-20 years so it actually needs to have readable/maintainable code.

4:43:07 Bike yeah. i mean, it's not like i'm including the whole C++ memory model here, just stuff that seems like it could be useful.

4:43:25 aeth beach: The "this doesn't need to be maintained and it does need to be a few nanoseconds faster" of today seems to be scientific computing

4:43:32 Bike i'd also like to put in safety guarantees java (the only other language with a memory model, practically) has that C++ doesn't

4:44:03 beach aeth: That's an interesting evolution. Thanks for the information. Though I still hear about "data-oriented programming" which seems to be the opposite of modulariry.

4:44:08 beach modularity

4:44:38 beach pjb: Maybe so, but I don't think ANY user should have to do that.

4:44:45 aeth Yeah, some gamedevs think that they're still living in the past because e.g. Id in the days of Doom is still very romanticized. But a ton of gamedev is just done in C# or even Lua these days.

4:44:59 aeth (Except the renderer, which is kind of special cased and usually still C++)

4:45:43 aeth And most of the rest is stuck on C++ because of lock-in, e.g. the middleware they use is C++ and good luck interoperating with C++ with something other than C++

4:47:07 beach Bike: Yes, I agree. I am just saying that there is probably no need so squeeze out every nanosecond. We have other advantages over languages such as C++.

4:47:27 Bike mhm.

4:47:35 beach aeth: That would explain what I keep hearing, yes. Thanks for the information.

4:48:21 Bike the sbcl subtypep cache is very aggressively optimized and deals with some of this stuff. it would be interesting to know what particularly motivated that. i mean, subtypep is important if you're doing a lot of type inference stuff, of course

4:48:48 beach But that mostly affects compilation time.

4:48:52 Bike (it is, unfortunately, also kind of unreadable, but for lispy macro reasons rather than because of memory model concerns)

4:49:26 beach I have the same impression when reading other parts of SBCL code.

4:51:49 beach I suspect we are observing the result of a maintenance model where (according to scymtym) each individual pretty much decides what is worthwhile, combined with a long history of evolution of the code.

4:53:17 beach For SICL, I would much rather emphasize general mechanisms such as the call-site optimizatino I came up with, the fast generic dispatch, path replication, etc., than attempting to individually optimize every other part of the code.

4:56:36 beach Let me take just one example. I decided to stick with the indirection imposed by standard objects, even for objects like functions simple arrays, etc. It would have been tempting to optimize those cases by introducing header-less objects, thereby complicating the code a lot.

4:56:37 beach But with general mechanisms (like call-site optimization) and others yet to be invented, we can keep the code simple and still get better performance that what would have been possible with those tempting optimizations.

4:58:30 beach Another example. The indirection through the header makes the rack internally consistent. So it seems a lot of synchronization can be avoided by forcing the reallocation of a new rack and the use of CAS.

4:58:57 beach Without the indirection, locks would be the only option in many cases, as far as I can tell.

5:00:07 beach I don't even know how SBCL handles AREF. I mean, it has to verify the size and then access the element. What if there is a call to ADJUST-ARRAY in the middle?

5:00:38 beach I guess they can afford to be unsafe and just crash.

5:00:57 beach ... or give the wrong answer.

5:07:11 beach Bike: The "Conflicts" section is great as well.

5:07:48 Bike that's good to hear. i'm not sure about that one. the simple/compound thing feels a bit too elaborate

5:08:09 Bike but also it seems like the obvious way to do it...

5:08:32 beach I am not sure it is the style that a standards document should have, but the explanation is good.

5:13:02 beach And the "Acquisition and release section" seems to totally answer my question from yesterday (my time).

5:13:41 Bike oh, good. i had to refresh my understanding of that stuff myself

5:13:52 Bike trying to define explain something is a good way to learn it

5:13:57 beach That's always a sign. :)

5:14:02 beach Yes.

5:14:46 beach Writing stuff like this can be fun actually. But it is hard to do it well.

5:15:08 Bike certainly this would need editing to be acceptable as a real standard

5:15:15 Bike not like i've ever written a standard before

5:15:30 beach Yeah.

5:17:16 beach "until it after it acquires it" should probably be "until after it acquires it"

5:18:11 beach Bike: Oh, and I am often a good proofreader for this kind of stuff, because of my apparent inability to understand prose that is not extremely precise.

5:18:47 Bike oops, you're right

5:20:54 beach The "Synchronization" section is also quite good.

5:21:15 beach Maybe it is time for you to take up book writing.

5:22:39 Bike that's flattering, but i won't be convinced i understand this stuff until i've had to debug programs using it :)

5:22:46 Bike sleeping now. i'll read the logs tomorrow. see you.

5:23:16 beach Sleep well!

5:23:58 beach I just meant book writing in general. Not necessarily about memory models.

5:45:06 ebrasca Good morning!

6:17:47 beach Hello ebrasca.

6:18:40 no-defun-allowed Good morning ebrasca!

7:42:39 zacts morning

7:46:37 beach Hello zacts.

8:01:07 shka_ hello zacts

8:15:14 beach Bike: "not alone are many writes and reads atomic", should that be "not ONLY"?

8:18:43 beach The section "Atomic read-modify-write operations" is otherwise fine.

8:23:10 beach Bike: "somehow knows that TRANSFORM does have access to the vector". Should that be "somehow knows that TRANSFORM does not have access to the vector"?

8:24:36 beach Bike: "since now and VECTOR can be discarded" . Should that be "since now VECTOR can be discarded" ?

8:28:11 beach Bike: "the main thread things" -> "the main thread thinks"

8:33:26 beach Bike: Otherwise, the section "Sequential inconsistency" is fine too.

8:37:33 beach Bike: "This last must be an atomic ..." -> "This last operation must be an atomic..." maybe?

8:38:56 beach Bike: "Releasing the lock - the write of the flag as on - ..." should that be "off"?

8:58:36 beach I'll read the last three sections later. I may need to be reminded.

9:36:10 heisig Good morning!

9:39:49 no-defun-allowed Good morning heisig!

9:39:50 ebrasca Morning heisig!

9:42:29 beach Hello heisig.

9:45:47 beach This has been a very strange day so far. I have been scrutinizing the excellent documentation Bike wrote, I have been writing a long email to my banker saying essentially that I don't appreciate being "f*cked with" (I won't give the details here), and I have been filling out forms for the annual general assembly of the house-owner's association, which will take place by email (or snail mail for those who are computer illiterate).

10:08:30 heisig I agree, Bike's documentation is excellent.

10:09:32 beach Yes, he should definitely consider writing books.

10:13:15 heisig I am not sure whether there is such a thing as a 'conservative, but not terribly slow' memory model that we could use for SICL.

10:14:10 heisig For one, we definitely should use every trick available for implementing the GC, and for implementing primitive data structures like locks.

10:14:25 beach Sure.

10:14:47 heisig Also, many choices boil down to the question whether we allow a CPU to perform out-of-order execution or not.

10:15:08 beach Indeed.

10:15:24 heisig And whether writes to memory should be buffered, or written to memory immediately. The latter can be very costly.

10:15:37 beach I am pretty sure there could be such a thing. I mean, most of the time, memory models and other types of performance details are viewed in isolation. I am totally willing to simplify the memory model to simplify the code, and instead insist on performance elsewhere.

10:16:29 beach Where "elsewhere" is parallel GC, call-site optimization, CAS for rack updates. etc.

10:17:21 heisig But the only significant simplification I see would be to treat every single shared variable in SICL as volatile (with Java's semantics for volatile).

10:18:24 beach And I think that would be fine, now that there are not many shared variables left, and most of them are probably read-only anyway.

10:18:37 beach Assuming you mean shared as in closed-over.

10:20:42 heisig Worse. Everything on the global heap.

10:22:06 beach I would think the rack update thing would help that.

10:22:46 heisig I just ran a benchmark. Adding a memory barrier to a write instruction adds 24 nanoseconds of overhead on my machine.

10:23:12 heisig So for writes to the L1 cache (which are frequent) we are talking about an order of magnitude slowdown.

10:23:26 beach I can believe that.

10:23:46 heisig Not that I advise against the idea (I am intrigued, actually), but we should be aware that it is very costly.

10:23:55 beach Definitely.

10:24:04 heisig Unless we come up with another magic trick :)

10:24:27 beach And we should not act until we have metrics to show where time is spent.

10:25:06 beach I firmly believe that a lot of code is made worse by incorrect hunches on the part of maintainers.

10:25:53 no-defun-allowed What do adding the barriers give you? I'm still not up to scratch on concurrency stuff.

10:26:05 no-defun-allowed do → does?

10:26:43 heisig no-defun-allowed: Barriers prevent the CPU from reordering certain instructions.

10:27:10 heisig A CPU makes sure that you cannot observe such reordering within one thread.

10:27:20 heisig But other threads can spot the difference.

10:27:34 heisig Such a bug recently bit me hard, so now I know about this.

10:27:51 no-defun-allowed I see, I recall that from your work-stealing queue.

10:28:13 heisig Exactly. In my case, I used a data structure that is only correct if several slots of a struct are read in the correct order.

10:28:52 heisig So I had to insert the right kind of barriers.

10:30:38 no-defun-allowed How tricky is it to explicitly add barriers? Bike's document suggests barriers are only required in "only a few places".

10:31:45 heisig Most barriers are just a single CPU instruction. Or even a NOP, in case the CPU enforces certain guarantees anyway.

10:32:02 heisig But one has to make sure that the compiler respects that barrier, too.

10:32:06 no-defun-allowed ACTION may have had a knee-jerk reaction to 24 nanoseconds; she was throwing fancy x86 instructions at hash table probing to save perhaps fewer nanoseconds.

10:34:09 heisig For me, 24 nanoseconds is a lot. I am one of the HPC guys that Bike warned about :)

10:34:12 no-defun-allowed I mostly meant if it affects programming style, but I can see that it requires a compiler to be more careful with transformations.

10:48:06 jackdaniel ACTION runs from heisig screaming

11:01:03 shka_ the horror

11:54:08 heisig Now I feel like a monster. Sorry guys, but someone has to do that stuff, too.

11:55:15 heisig Dammit that is the same justification that the arms industry uses. Maybe I should just shut up.

11:56:18 jackdaniel if it finishes before my coffee is ready I'm fine

11:58:03 frgo_ ** NICK frgo

13:46:20 Bike thank you for actuall providing some numbers, heisig

13:46:23 Bike actually*

13:51:33 Bike i figure most people would stick with locks, and then of the fewer people using atomics most would use sequential consistency (which is the default), and then a few Real Hackers like heisig can use the other stuff

13:52:03 Bike and in the meantime the implementation makes as much as possible unordered even without the programmer saying so, so that if they make a mistake they just get weird results instead of crashing

13:53:19 jackdaniel that's why I've run with screaming, hackers are criminals aren't they? ,)

13:55:40 Bike the C++ thing i'm leaving out is memory_order_consume, which as far as i can tell has been in the C++ standard for ten years without anyone implementing it properly or even understanding how it works

14:02:15 Bike i think the linux kernel people wanted it but they didn't specify it very well

14:02:36 shka_ linux kernel people don't even use C++

14:02:41 shka_ ?

14:03:12 Bike it's in C11 as well.

14:03:24 shka_ ok

14:03:32 Bike but yeah, i don't know, they're definitely partly involved. one of these papers has torvalds as a contributor

14:03:57 Bike http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0462r1.pdf

14:04:01 shka_ "c++ is shit, i know because i contributed to make it shit"

14:04:19 shka_ weird stuff

14:08:30 Bike thank you for pointing out those typos, beach

14:10:16 beach Sure.

15:24:56 Bike ok, next question. SBCL defines an operator atomic-update that uses CAS to perform an arbitrary update on a place, i.e. it does an atomic read-modify-write with some function and arguments you pass in. it does that with a CAS loop. The question is: in sbcl's implementation, the update function form and the extra arguments can be evaluated multiple times. is that sensible?

15:25:12 Bike like, you can do atomic-push (though that has its own definition) as (atomic-update place #'cons t)

15:25:33 Bike and the #'cons and t forms might be evaluated multiple times. this is apparently intentional but i'm not sure i understand why

15:25:49 Bike er, (atomic-push object list) = (atomic-update place #'cons object), i mean

15:26:42 Bike subforms of the place are only evaluated once, though... hm...

15:27:03 beach Sounds like a question for karlosz or scymtym. Or perhaps you are just thinking out loud?

15:28:06 Bike well, kind of. it would be nice if i could see some uses of atomic-update to understand what makes more sense with how it's used, but there aren't a lot of uses around. but maybe somebody here has ideas