libera/#sicl - IRC Chatlog
Search
5:03:31
beach
For my physical exercise today, I was watching a video related to RISC-V, and that made me thinking that, for detecting overflow of fixnum addition and subtraction, it can become important for type inference to determine whether fixnums are negative or not.
5:06:10
beach
The general overflow check takes two more instructions in RISC-V (in addition to the arithmetic operation and the branch), but no more instructions are required if the sign of one operand is known.
5:10:40
moon-child
on the mill, there is an addition instruction that will fault if it overflows, so you don't need _any_ extra code to check for overflow
5:19:20
beach
I always hesitate when the operating system gets involved. Same thing for read/write barriers and such.
5:22:52
hayley
Would there be room in an instruction encoded in your favourite instruction set to implement an instruction like "add and branch on overflow"?
5:23:34
hayley
Come to think of it, we need an instruction type and ideally three register numbers already, so it would be tight.
5:23:54
beach
That would be my preferred way, but, yes, I don't think there is room for such an instruction.
5:24:03
moon-child
the codespace improvements are definitely worth it though. Even with a single 'add and branch on overflow' instruction, you need to take up space in the code for the branch target in the hot path
5:24:43
moon-child
whereas if you do it as fault recovery, you look up the recovery point in a table, which is all in the cold path
5:24:50
hayley
From memory, the Azul machines had a read barrier which called a userspace function without invoking the OS, but a read barrier has only one job.
5:25:40
hayley
(Particuarly, a read barrier just fixes up a pointer, whereas the overflow handler might signal a condition, or create a bignum and continue, and so on.)
5:28:40
moon-child
let's say, the path we would like to be fast vs the path we can afford to be slow. For instance, in the overflow case, the overhead of creating a bignum and doing bignum arithmetic is great enough that such optimizations are no longer useful
5:29:27
moon-child
relatedly, gcc and clang will place code paths which are rarely executed distantly from those which are frequently executed, to improve i$ locality. (To great effect, so I hear)
5:30:23
beach
Ah, yes, I see. So in case of a branch, the exceptional code would still be too close to the more likely one.
5:31:07
hayley
Another option would be to have two instructions, one to set some handler register, and another to perform the add. But that's awfully similar to having a normal add instruction, and then a jump-on-overflow-bit instruction.
5:34:11
beach
Now, with a "modern" operating system, would the entire process be interrupted in case of a trap, or just the thread?
5:35:43
beach
And, I guess either way, a trap could have serious consequences for response times, like if one wants to use the system for processing sound.
5:37:45
hayley
I measured a trap handler and mremap to take about two microseconds on my Linux desktop. Such latency would be excusable if traps were infrequent enough, but it would definitely slow down lots of bignum operations.
5:44:00
beach
And we can tell people who do applications for processing sound to avoid overflow of fixnum arithmetic.
5:45:25
hayley
What sort of latency would you consider acceptable? The less latency, the better, of course, but still.
5:46:12
beach
For sound, a few milliseconds. So if the entire thing takes only microseconds, then that's no problem.
5:47:25
moon-child
beach: well, end-to-end, so including hardware. But it occurs to me that is not a very useful measure here!
5:47:34
hayley
Okay then. Just checking, as I've had a wide range of latency limits depending on hardware.
5:50:18
hayley
That said, I'm now starting to think latency is sort of irrelevant in this situation, as a larger buffer size (i.e. higher latency) requires more samples to be computed at a time.
5:57:56
hayley
Well, no, a larger buffer size allows the system to tolerate larger "hiccups" in processing time, including those caused by traps.
6:06:17
beach
Computing samples is fast. The problem is that the ear is very sensitive to latency, so too big buffers are out of the question.
6:15:42
hayley
Though the read barrier in my concurrent compacting GC design uses MMU traps, I've tried to keep the trap frequency to a minimum to make it work acceptably with stock hardware and kernels.
6:16:38
hayley
Do any sound processing algorithms produce bignums? I admit I really don't know any algorithms other than some oscillators and one resonant filter.
6:25:50
hayley
Speaking of, I am trying to figure out something: Gil Tene told me that a read barrier is preferable to a write barrier, as the mutation rate tends to increase when the allocation rate increases.
6:26:31
hayley
But, surely the read rate increases too, as you tend to use objects that you allocate?
6:27:52
hayley
And, if one makes use of persistent data structures, or functions which copy (parts of) sequences rather than modifying them, then they would allocate more with no mutation?
6:28:41
hayley
Yes, I think he is considering Java code, which I suspect is more mutation-heavy than Common Lisp code, but it doesn't make sense even then.
6:29:02
beach
Probably. So it depends on the programming language, on the programming paradigm used, on the data structures, etc.
6:31:54
hayley
But still, you don't write for no reason, and so I would expect a higher mutation rate to occur with a higher read rate (but not necessarily the other way around).
6:37:13
hayley
I couldn't figure how to write that sentence, as it didn't seem right to suggest that a high read rate is caused by a high mutation rate.
6:37:24
beach
hayley: I could very well imagine the read rate to be fairly independent of the mutation rate. But then, I am often unable to think through things like that.
6:38:32
hayley
It's not correlation, because you can have a low (or zero) mutation rate and a high read rate when using functional programming.
6:38:40
moon-child
beach: a write which is never read is 'useless'. Now, the alternative may be more expensive, but I think most writes are not dead
6:42:25
hayley
So, let's say that both read and mutation rates are either "low" or "high". I can imagine a program with a high read rate and low write rate, but not a (useful) program with a low read rate and high write rate.
6:44:11
moon-child
I wonder how easy it is to collect statistics on such things. Valgrind has virtual cache hit statistics, and tracks reads from uninitialized memory, so I think it might be convinced to track the rate of un-read writes
6:45:43
hayley
Our favourite sufficiently smart compiler would remove dead writes; I don't know how well eliminating dead writes works in real compilers though.
6:47:07
hayley
The first search result for "compiler remove dead writes" is a Usenix presentation on how removing dead writes allows secret data to remain in memory, rather than zeroing it out. Fun.
6:47:50
moon-child
imagine some shared state, where thread 1 writes state A, then writes state B, and not until the transition to state B does thread 2 happen to read. I think that's going to be where the majority of spurious writes happen, and that may not be deterministic
6:51:11
hayley
It is possible for a compiler to eliminate the write of state A, depending on how simple our states are. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html covers such optimisations.
10:51:43
hayley
I made an attempt at metering SICL bootstrapping: the HIR evaluator was modified so that RPLACA and RPLACD instructions would increment a counter, and bootstrapping was modified so that the functions imported for RPLACA and RPLACD would increment the same counter, and CONS, LIST and LIST* would increment another counter.
10:53:10
hayley
There were 2,778,200 cons cells allocated by those functions, and 74,383 updates were performed. But these numbers don't mean very much.