libera/#sicl - IRC Chatlog

5:50:57 hayley Good morning beach!

5:51:38 hayley I just finished with a student meetup in the city.

5:52:09 beach Was it good?

5:52:36 hayley I'd better wait until I am at home to read the backlog, and not on the bus still.

5:53:00 beach Sure.

5:53:02 moon-child beach: not if data are smaller than a cache line

5:53:22 beach moon-child: Yes, I see. Good point.

5:53:56 moon-child and, consider iterating over a list: if the nodes are all sequential, the cpu will detect the linear access pattern and prefetch later nodes

5:54:10 hayley It was great. Quite nice to see people in person after mostly just knowing them by name and voice.

5:54:21 beach moon-child: Yes, that's what I meant by the first time.

5:54:32 moon-child ok

5:54:51 beach hayley: I see, yes. That's one reason I want to go to Porto rather than doing ELS online.

5:57:51 hayley All a generational collector requires, to me, is that a young generation can be collected independent of old generation(s). Hence a generational collector does not require moving; you just need to be able to partition the heap somehow.

5:58:14 moon-child it would still probably require barriers, though

5:58:26 beach True. And there is the "fake moving" collector identified by Paul Wilson.

5:58:49 hayley The scheme I have been thinking of is "pretenuring", where we predict that objects allocated by one particular allocation point will tend to live longer, based on runtime feedback.

5:59:40 moon-child '"fake moving" collector'?

6:00:01 moon-child I can not find any references to that

6:00:02 beach Yes, that's what he called it.

6:00:34 hayley Yes, you'd still need a write barrier to identify old-to-new references. (Well, the Go developers experimented with hashing...somehow to identify updated pages without a barrier, but it didn't work too well)

6:00:52 beach moon-child: I forget the paper in which he mentions it. Maybe the GC survey?

6:02:57 hayley But while objects that are allocated live together may tend to die together, it might not be the case that the mutator will use them in the order they were allocated. Shipilëv's test program produces a worst case of sorts, wherein the iteration order is completely random with regards to allocation order.

6:03:23 moon-child I find a reference only to 'fake copying'; perhaps that was it?

6:03:39 beach Ah, yes, sorry!

6:03:53 hayley ("allocated together", not "allocated live together")

6:08:05 moon-child that is interesting. Though I object to the assertion that '[j]ust as in a copy collector space reclamation is implicit'; the reclaimed objects are still operated upon

6:09:02 moon-child (in particular, they must be relinked when yet-reachable objects are moved to the alternate 'semispace')

6:11:20 moon-child and the addition of a colour bit is a bit of a giveaway, imo; this really seems closer to mark-and-sweep

6:12:38 hayley Is it much like Baker's treadmill collector?

6:13:44 moon-child I don't know. https://www.cs.cmu.edu/~fp/courses/15411-f08/misc/wilson94-gc.pdf page 13

6:13:51 hayley ...wherein the fromspace and tospace are represented by linking objects on doubly linked list. But maybe not, if there is a colour bit.

6:14:52 hayley It seems to me that Wang and Baker proposed the same idea, and Wilson just wrote the description in this survey.

6:16:03 beach I think they are very similar, and I don't know which one was first.

6:17:10 beach I am still intrigued by this idea of taking into account multiple cores. I know moon-child said "but the work has to be done" or something like that, but it is looking like we are going to get more and more cores, and that it is going to be hard to get them all to do useful work.

6:17:24 beach If that is the case, then a non-moving collector is perfect.

6:17:48 beach Each thread just has to do the marking of the roots.

6:18:01 hayley But with regards to compacting or not, it appears that the benchmarks taken in <https://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf>, notably on pages 1 and 6, suggest that compaction lowers the time and cache misses taken by the mutator. I don't know if it is due to locality, though.

6:19:39 hayley What about memory bandwidth? There are already "allocation walls" wherein memory bandwidth simply prevents one from writing a faster GC, without improving cache utilisation (perhaps by thread-local GC).

6:20:18 beach Yes, another good question.

6:21:01 beach So then the question becomes: What is the relative importance of improved mutator performance due to moving/copying/generational etc. and improved collection performance by having free cores do all the work?

6:21:02 moon-child ha, treadmill paper refers to 'brenda baker'. No relation, apparently

6:21:47 beach I mean, I don't expect any answers. But we should think about such issues.

6:21:53 hayley I recall Cliff Click claimed "one of the things Azul had (with 864-core machines) was the magical property of having a f**kton of bandwidth." (His words, not mine!) Azul machines also had a lot of spare cores to run GC and JIT compilation on, apparently.

6:23:04 beach Also, if all a mutator thread has to do is mark the roots, then pauses will be very brief in most cases.

6:23:20 moon-child yeah, that seems like a somewhat ideal position for a jit to be in

6:23:33 moon-child beach: if the mutator only marks the roots, and other marking happens off-thread, then you need barriers

6:23:44 hayley But the hardware and software eventually implemented generational GC, so I guess there were still gains to be had.

6:23:51 beach moon-child: I don't see that.

6:25:13 hayley A mutator thread still has to enforce some invariant like "no black objects refer directly to white objects", and a barrier would still be necessary to enforce it. But the slow path of such a barrier is still not too slow.

6:25:17 moon-child Suppose I have X, Y, Z. X Y are roots, and we have Y -> Z. First the mark thread traces X. Then the mutator creates a reference X -> Z, and destroys the reference Y -> Z. Then the mark thread traces Y. Then it is done. Z is still white

6:25:42 beach Ah, yes, I see.

6:27:42 hayley Hence, a concurrent copying collector "just" has another kind of barrier, which is usually a read barrier. We suspect that read barriers might be slower, as reads are more frequent than writes, but mutator threads are still not doing much collection work.

6:31:42 moon-child something that occurs to me: when the mutator trips a barrier, it fixes up its pointer. This is a bit like the go strategy of moving gc work onto a mutator thread if the latter gets ahead of the former, except that you only do it when you have to

6:32:05 moon-child (and of course, go does not copy)

6:32:17 hayley Right, the aim is to get the collector to eventually catch up with the mutator.

6:37:04 hayley (Must say, the jokes in the meetup were _awful_, as it was entirely comprised of computer science students. We went for lunch, and I proposed to collect all the rubbish and take it to the bin. When I came back, I couldn't tell what the other students were discussing, and proclaimed "Damn concurrent collection, you've started talking again without me!")

6:51:16 hayley Is Wilson's claim that "objects that are allocated together, die together" making a claim about temporal locality or spatial locality though? i.e. is it "objects allocated around the same time die around the same time", or "objects allocated nearby in memory die around the same time"? A bump allocator makes "allocated around the same time" and "allocated nearby in memory" very similar, but I am not so sure for a mark-sweep collector.

6:52:27 moon-child I can't imagine he was referring to spatial locality

6:53:44 hayley I think good locality is achieved when "objects allocated nearby in memory are used around the same time", which is an opposite of sorts of the latter interpretation.

6:54:48 moon-child I agree

6:58:27 hayley So, "objects that are allocated at around the same time also die around the same time" is useful for predicting the lifetimes of objects (i.e. pretenuring), but I am not sure how it interacts with locality with a mark-sweep collector. The interaction with locality with compacting collection (or moreso bump allocation, but the two are correlated) is easier to spot, as objects allocated around the same time are also nearby in memory.

7:07:21 beach I think Wilson is talking about time.

7:08:31 beach And the example of traversing a list is interesting, because you can't have both the CAR and the CDR nearby.

7:08:48 beach So a compacting GC needs to make a choice.

7:09:02 moon-child well, depends on how near you need them to be :P

7:09:48 hayley I recall reading a study about making the decision to line up CARs or CDRs in Interlisp, which then presented the entropy of the resulting words in the heap.

7:10:26 beach But that was before the processor cache hierarchy we now have, right?

7:10:50 hayley Right, compacting was used to avoid paging instead of cache misses.

7:11:11 beach Ah, yes, it all comes back. :)

7:11:21 moon-child same hierarchy, we just added a hard drive on the bottom ;)

7:11:59 beach Heh.

7:13:45 hayley Admittedly I have to wonder if designs that avoid paging are still useful for avoiding cache misses. One reason against such logic is that reading a random page off disk is slower than reading a random cache line out of primary memory, and disk pages tend to be at least a magnitude larger.

7:25:27 moon-child For better or worse, the cache is much more 'dynamic'. You can not enquire what is currently in it, and it is more possible to page memory in, lazily and asynchronously

7:26:22 moon-child I suppose cache also complicates concurrency. Were there any multiprocessors without cache?

7:27:40 White_Flame maybe the transputer?

7:28:15 White_Flame anything with message passing doesn't need consistency

7:29:24 moon-child wiki says 'Unlike the earlier models, the T9000 had a true 16 KB high-speed cache'

7:33:53 White_Flame yeah, the earlier had only "ram" on chip, not a cache

7:38:24 hayley The Raspberry Pi Pico has two cores, no cache, and no CAS or LL/SC instructions.

7:44:24 moon-child ;o

7:44:26 moon-child how does _that_ work?

7:44:29 moon-child just hope you get lucky?

7:47:02 hayley Same way it works in hardware; you have to employ some other algorithm for mutual exclusion, like the bakery algorithm.

7:47:26 hayley ACTION taps http://lamport.azurewebsites.net/pubs/lamport-mutual-solved.pdf

7:54:11 hayley The runtime library provided by the Pi Foundation includes a mutex implementation, but I haven't checked how they actually do it.

11:38:04 scymtym_ ** NICK scymtym

13:01:07 heisig Hello everyone!

13:02:49 beach Hello heisig.

13:20:53 beach heisig: How is the dissertation work going?

13:29:54 heisig beach: Slow, but steady. I somehow seem to end up with 30+ pages of related work, where each page took me a full day to write.

13:31:19 heisig I hope the rest won't be as tedious to write.

13:32:11 heisig And there is this ELS paper about SIMD programming that I want to write, too.

13:55:31 beach A day per page of previous work sounds reasonable.

14:29:38 beach Bike: I was thinking about your case of source information in code that has been constructed programmatically. It would be nice to assign source positions by pretty-printing the code.

14:32:15 Bike you mean, like compile gets some conses, it pretty prints them, then uses the resulting text for the source locations?

14:32:25 beach Yeah.

14:32:51 beach Not by re-reading it of course, because that could be impossible.

14:33:37 beach Somehow, it must be possible to tell the pretty printer to assign source locations. Or something like that.

14:33:42 yitzi With a custom client in incless/inravina maybe you could just capture the line/column for each subform.

14:34:00 beach Something like that, yes.

14:35:04 beach Maybe have the pretty printer create a CST.

14:35:57 beach Bike: It would make it possible to use the same tools for examining source code that has been created programmatically that are used for source code being read.

14:36:13 yitzi Yes, I think that would be doable. The line numbers in Inravina are all relative since it is just there to support `*print-lines*`

14:36:17 beach And it would avoid that ugly *****HERE*****

14:36:49 beach yitzi: That should be an easy transformation.

14:38:04 Bike interesting.

14:43:22 Bike beach: wrt subtypep talk, i've been thinking about the WSCL type error idea. i've been sketching out my thoughts in longer form, but the gist of it is that specifying that operators signal particular errors may not be the best way to go about it

14:44:03 Bike for reasons like, if subtypep had to signal an error on non type specifiers, you could write conforming code to check if something is a type specifier by handling such an error; but that would be stylistically unfortunate and somewhat bug prone compared to an actual predicate function

14:44:06 beach Not even TYPE-ERROR?

14:46:20 Bike it would also be important to clarify timing. for example, it is impossible for THE to detect and signal type declaration violations in all cases due to FUNCTION types, but they could be detected later

14:46:33 beach Isn't that just one particularly twisted case? I mean, I don't see anything wrong with requiring AREF to signal a type error if it is given something that is not an array.

14:47:34 beach I don't think it has been stated that WSCL should require that a TYPE-ERROR is always signaled by every operator.

14:47:44 Bike it hasn't been. i'm thinking ahead.

14:47:51 beach Ah, OK.

14:48:10 beach Then, yes, that would be a bad path to take.

14:50:19 beach Also, in the case of SUBTYPEP, it is iffy, because it is always preferable to have WSCL specify something that is already done by most implementations in practice. And here we have at least one implementation that might have made the decision to gain a few nanoseconds by not checking that it has been given a type specifier.

14:51:20 moon-child I expect it was an accident

14:51:45 moon-child and that they expected types to be specified as lists

14:51:56 Bike depends on what you mean by accident. the sbcl definition of subtypep has a comment explicitly laying out that they're not required to check that the arguments are valid type specifiers.

14:52:11 beach Hmm, interesting.

14:52:12 moon-child ah, ok

14:52:21 moon-child (I only checked the source code, not the documentation :P)

14:52:26 Bike for exactly this reason, i.e. checking EQUAL before doing anything else

14:54:54 Bike with aref... take (let ((arr (foo))) (map nil f arr) (aref arr 0)). i think an implementation that checked if FOO returned an array, and immediately signaled an error if it didn't, would be fine (morally, if not conformingly at the moment, I'm not sure about that). that would let it give a better message concerning the actual source of the problem, FOO. and it would let it compile the MAP call to only traverse

14:55:00 Bike vectors without losing safety.

14:55:59 Bike that's not incompatible with saying a type error is signaled if aref is given a non-array, but it might be phrased differently than "aref should signal an error"

15:01:25 yitzi beach: Yes, the line numbers are basically from the beginning of the top level form, so that should work.

15:01:53 Bike in any case i think it's not clear whether this transformation would be allowed currently, so clarifying that either way would be good

15:21:50 yitzi beach: The current XP pprint or Inravina pprint can't save column/line info back to a CST because at the time of print-object/pprint-dispatch that information is not known b.c. the lines and not been finalized yet. And the object references are not kept in output fragments in case of Inravina. XP just keeps an adjustable text vector, so no there either.

15:23:18 yitzi Sorry, that was little jumbled. I can clarify if needed.

15:57:31 Bike makes sense to me. the printer doesn't lay out a line until it knows what more is to be printed

16:14:14 yitzi Yes, there is a generic interface `layout-instruction` that could be used, but I didn't expose that yet. There is also the bottom level that actually writes to the stream (write-fragment) but that is not generic or exposed. In other words it is possible to accomplish this, but I'd have expose the interface somewhere. And probably save the object references in the instruction queue or fragment buffer.

16:30:16 beach yitzi: I think I understand.

16:51:08 yitzi beach: I don't think that I will be able to do an ELS paper this year. Some unanticipated things came and I probably just didn't allocate enough time to it. I still want to do one, but I am thinking it might be next year. There are also some unresolved stuff in the Inravina/Incless interface in addition to what I have mentioned.

16:54:39 beach Too bad about the ELS paper. You know there is usually a week of extended deadline, right?

16:54:55 beach But, sure, there is no rush. Next year is fine.

16:56:32 yitzi I do. I have made progress on the shim technique (SBCL & ECL), but I really would like to see it in SICL. I feel like there is a lot unresolved.

16:57:22 beach Yes, I see.

16:58:00 yitzi And to be honest...part of the "unanticipated" stuff was getting sick and whole bunch of stupid downsizing going on at my college. Such is life.

16:59:00 beach Oh, that sucks! Sorry to hear that.

17:00:31 yitzi Yeah. Everything is fine now. I am just anticipating moving and looking for work soon.

17:01:59 beach Oh, so you got laid off? That's terrible!

17:05:58 yitzi No, not yet. I still have a job, thankfully. A whole bunch people that work for me in the Math dept have been. I've been able to build up a Math/Physics 2-year program over the last 15 years in various roles I have had such as Dean, Division Chair, Dept Chair. In the span of a year due to COVID and incompetence that has virtually been erased. Plus the people that have been put in charge are making me think it is only going to get worse.

17:06:59 beach Oh, wow! I can see why you would want a change.

17:07:21 beach But with that track record, you should be able to find something suitable elsewhere.

17:08:44 yitzi I know, and thanks for noting that. :D

17:32:01 beach In the past, I had have had periods when I wanted to work in industry rather than academia, because the rules seem more clear there to me, i.e. profit. But in reality it is more complicated than that, and academia gave me the freedom to do pretty much what I want in terms of research. So I stayed on.

17:32:43 beach Moving to a new place is pretty good though, because it takes a few years before you notice any problems. :)

17:33:14 moon-child yes, it seems a much less ... artificial environment

17:33:35 moon-child but less fun

17:33:44 beach Yeah. The rules in academia are very obscure.

17:34:27 moon-child of course corporations are not free from bureaucracy

17:34:43 yitzi I work at a sort of "for profit" college. So that is a bizarre hybrid.

17:34:50 beach Exactly. And middle management is not exactly driven by profit.