libera/#sicl - IRC Chatlog

5:03:31 beach For my physical exercise today, I was watching a video related to RISC-V, and that made me thinking that, for detecting overflow of fixnum addition and subtraction, it can become important for type inference to determine whether fixnums are negative or not.

5:06:10 beach The general overflow check takes two more instructions in RISC-V (in addition to the arithmetic operation and the branch), but no more instructions are required if the sign of one operand is known.

5:10:40 moon-child on the mill, there is an addition instruction that will fault if it overflows, so you don't need _any_ extra code to check for overflow

5:11:38 beach I see. What does a fault result in?

5:12:12 beach Trap to the operating system I imagine?

5:13:13 moon-child yeah

5:19:20 beach I always hesitate when the operating system gets involved. Same thing for read/write barriers and such.

5:19:48 beach ... and misaligned loads.

5:22:52 hayley Would there be room in an instruction encoded in your favourite instruction set to implement an instruction like "add and branch on overflow"?

5:23:34 hayley Come to think of it, we need an instruction type and ideally three register numbers already, so it would be tight.

5:23:54 beach That would be my preferred way, but, yes, I don't think there is room for such an instruction.

5:24:03 moon-child the codespace improvements are definitely worth it though. Even with a single 'add and branch on overflow' instruction, you need to take up space in the code for the branch target in the hot path

5:24:43 moon-child whereas if you do it as fault recovery, you look up the recovery point in a table, which is all in the cold path

5:24:50 hayley From memory, the Azul machines had a read barrier which called a userspace function without invoking the OS, but a read barrier has only one job.

5:25:40 hayley (Particuarly, a read barrier just fixes up a pointer, whereas the overflow handler might signal a condition, or create a bignum and continue, and so on.)

5:26:11 beach moon-child: What qualifies as "hot path" and "cold path" here?

5:28:40 moon-child let's say, the path we would like to be fast vs the path we can afford to be slow. For instance, in the overflow case, the overhead of creating a bignum and doing bignum arithmetic is great enough that such optimizations are no longer useful

5:29:27 moon-child relatedly, gcc and clang will place code paths which are rarely executed distantly from those which are frequently executed, to improve i$ locality. (To great effect, so I hear)

5:30:23 beach Ah, yes, I see. So in case of a branch, the exceptional code would still be too close to the more likely one.

5:31:07 hayley Another option would be to have two instructions, one to set some handler register, and another to perform the add. But that's awfully similar to having a normal add instruction, and then a jump-on-overflow-bit instruction.

5:34:11 beach Now, with a "modern" operating system, would the entire process be interrupted in case of a trap, or just the thread?

5:35:43 beach And, I guess either way, a trap could have serious consequences for response times, like if one wants to use the system for processing sound.

5:36:09 moon-child thread

5:36:17 hayley I think just the thread.

5:36:18 beach OK, thanks.

5:37:45 hayley I measured a trap handler and mremap to take about two microseconds on my Linux desktop. Such latency would be excusable if traps were infrequent enough, but it would definitely slow down lots of bignum operations.

5:40:00 beach Well, if you are already doing bignum arithmetic, you are not taking traps.

5:40:32 hayley Yes, my bad.

5:44:00 beach And we can tell people who do applications for processing sound to avoid overflow of fixnum arithmetic.

5:45:25 hayley What sort of latency would you consider acceptable? The less latency, the better, of course, but still.

5:46:12 beach For sound, a few milliseconds. So if the entire thing takes only microseconds, then that's no problem.

5:46:19 moon-child end-to-end latency should ideally be single-digit ms

5:46:25 moon-child or low double-digit

5:46:37 beach For sound, the former.

5:47:25 moon-child beach: well, end-to-end, so including hardware. But it occurs to me that is not a very useful measure here!

5:47:34 hayley Okay then. Just checking, as I've had a wide range of latency limits depending on hardware.

5:47:53 beach moon-child: Maybe so.

5:50:18 hayley That said, I'm now starting to think latency is sort of irrelevant in this situation, as a larger buffer size (i.e. higher latency) requires more samples to be computed at a time.

5:57:56 hayley Well, no, a larger buffer size allows the system to tolerate larger "hiccups" in processing time, including those caused by traps.

6:06:17 beach Computing samples is fast. The problem is that the ear is very sensitive to latency, so too big buffers are out of the question.

6:08:22 hayley Yes.

6:15:42 hayley Though the read barrier in my concurrent compacting GC design uses MMU traps, I've tried to keep the trap frequency to a minimum to make it work acceptably with stock hardware and kernels.

6:16:38 hayley Do any sound processing algorithms produce bignums? I admit I really don't know any algorithms other than some oscillators and one resonant filter.

6:19:09 beach You would typically use floats.

6:19:23 hayley Right.

6:25:50 hayley Speaking of, I am trying to figure out something: Gil Tene told me that a read barrier is preferable to a write barrier, as the mutation rate tends to increase when the allocation rate increases.

6:26:31 hayley But, surely the read rate increases too, as you tend to use objects that you allocate?

6:26:44 beach That sounds like an observation that is way too general.

6:27:14 moon-child perhaps the read is easier to speculate than the write?

6:27:52 hayley And, if one makes use of persistent data structures, or functions which copy (parts of) sequences rather than modifying them, then they would allocate more with no mutation?

6:28:41 hayley Yes, I think he is considering Java code, which I suspect is more mutation-heavy than Common Lisp code, but it doesn't make sense even then.

6:29:02 beach Probably. So it depends on the programming language, on the programming paradigm used, on the data structures, etc.

6:31:54 hayley But still, you don't write for no reason, and so I would expect a higher mutation rate to occur with a higher read rate (but not necessarily the other way around).

6:34:35 beach What direction is your "with"? "as a result of"? "resulting in"?

6:35:28 hayley A higher read rate would occur as a result of a higher mutation rate.

6:35:35 moon-child I read it as undirectional 'correlated with'

6:35:59 beach moon-child: Yes, but then "the other way around" didn't make sense.

6:37:13 hayley I couldn't figure how to write that sentence, as it didn't seem right to suggest that a high read rate is caused by a high mutation rate.

6:37:23 moon-child ^

6:37:24 beach hayley: I could very well imagine the read rate to be fairly independent of the mutation rate. But then, I am often unable to think through things like that.

6:38:32 hayley It's not correlation, because you can have a low (or zero) mutation rate and a high read rate when using functional programming.

6:38:40 moon-child beach: a write which is never read is 'useless'. Now, the alternative may be more expensive, but I think most writes are not dead

6:38:54 hayley beach: Yeah, I probably shouldn't speculate on these things.

6:39:20 moon-child hayley: I think a correlation may be directional but not causative

6:39:45 moon-child hmm, wikipedia disagrees

6:42:25 hayley So, let's say that both read and mutation rates are either "low" or "high". I can imagine a program with a high read rate and low write rate, but not a (useful) program with a low read rate and high write rate.

6:44:11 moon-child I wonder how easy it is to collect statistics on such things. Valgrind has virtual cache hit statistics, and tracks reads from uninitialized memory, so I think it might be convinced to track the rate of un-read writes

6:45:43 hayley Our favourite sufficiently smart compiler would remove dead writes; I don't know how well eliminating dead writes works in real compilers though.

6:47:07 hayley The first search result for "compiler remove dead writes" is a Usenix presentation on how removing dead writes allows secret data to remain in memory, rather than zeroing it out. Fun.

6:47:50 moon-child imagine some shared state, where thread 1 writes state A, then writes state B, and not until the transition to state B does thread 2 happen to read. I think that's going to be where the majority of spurious writes happen, and that may not be deterministic

6:48:16 hayley Right.

6:51:11 hayley It is possible for a compiler to eliminate the write of state A, depending on how simple our states are. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html covers such optimisations.

9:40:50 hayley https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf

9:40:56 hayley Er, wrong room.

9:41:13 hayley I guess you can read about read-don't-modify-write operations if you like.

10:51:43 hayley I made an attempt at metering SICL bootstrapping: the HIR evaluator was modified so that RPLACA and RPLACD instructions would increment a counter, and bootstrapping was modified so that the functions imported for RPLACA and RPLACD would increment the same counter, and CONS, LIST and LIST* would increment another counter.

10:53:10 hayley There were 2,778,200 cons cells allocated by those functions, and 74,383 updates were performed. But these numbers don't mean very much.

10:59:34 hayley After remembering to meter the SETF functions, 100,884 updates were detected.

12:01:42 beach So many more reads that writes.