freenode/#sicl - IRC Chatlog
Search
7:12:47
lukego
Yeah. And LuaJIT doesn't have a good answer for this. Actually *doing* late binding stuff really messes with performance in practice.
7:12:47
moon-child
in a jit, you would generate a 'unoptimized' caller which performs an indirect jump to the callee and then an 'optimized' version which inlines the callee. When the callee changes, the optimized version of the caller is invalidated and you fall back to the unoptimized version
7:14:59
lukego
Hard to think about this stuff abstractly though. In a given application there will be /something/ limiting performance at the CPU level. Instruction fetching? Data fetching? ALU resources? Branch mispredictions? Instruction window space? In every case some optimizations will help and some will hinder.
7:15:27
lukego
best solution is probably to have well-defined optimizations that the application programmer can take into account. So kudos :)
7:16:15
moon-child
in the linux kernel, every compiled function begins with a NOP sequence, which can be runtime-patched into a direct jump, allowing to upgrade the kernel without rebooting the system. I expect you could do something similar
7:16:23
beach
In this case, aside from cache effects as moon-child points out, my technique does strictly less work than the default case. So it is hard for me to see how I can lose.
7:17:54
no-defun-allowed
However, one should note that they don't use optimisations which make it look like your program isn't being run on a bytecode machine. So I suppose that is always possible for them.
7:18:21
lukego
beach: I dunno. I feel like "cache effects" and "speculative execution" are the first-order problems and e.g. number of instructions executed is second order. have to worry about what hazards can occur in transferring execution from the caller to the callee. any branch via memory load is a bit scary surely.
7:18:49
no-defun-allowed
Rewording: they don't optimize in ways which prevent the creation of an equivalent virtual machine state.
7:19:06
beach
lukego: Branch via memory load is what is traditionally done. My technique avoids that.
7:19:36
lukego
I thought you have an extra branch? from caller to trampoline, from trampoline to callee?
7:20:50
lukego
oh right both branches are like that, right? okay that does start to sounds quite nice :)
7:21:13
beach
moon-child: That NOP trick would not be useful for named calls, because the callee can be redefined arbitrarily.
7:22:10
no-defun-allowed
May I suggest taking a gander through "The design and implementation of the Self Compiler" by Craig Chambers, particuarly section 13.2? That section covers how they handle redefining inlined functions.
7:22:38
beach
lukego: And when the "number of instructions" include multiple loops over the list of argument in order to parse keyword arguments, then I think the number of instructions becomes quite relevant indeed.
7:23:01
lukego
beach: okay yeah this technique makes a lot of sense to me now :). one reasonable question is whether the work saved by the trampoline is worth the additional branch - icache locality argument - but if I were a betting man I'd reckon so.
7:23:46
no-defun-allowed
i.e. from page 168 (as the PDF viewer thinks it is, or page 154 on the paper) of <http://www.wolczko.com/tmp/ChambersThesis.pdf>
7:23:48
moon-child
it occurs to me that if you allocate all the traampolines in the same arena, you would get fairly good locality
7:23:59
moon-child
particularly if many of them are the standard snippet, which will have uniform size
7:24:15
lukego
and if it means people can stop writing hand-optimized compiler macros for the sake of &key processing etc then that's a massive win for the psychological wellbeing of the application programmer.
7:24:24
no-defun-allowed
And a friend and I think it is a very well written thesis, for what it's worth.
7:27:46
beach
moon-child: There is real connection. It is just handy that code won't move for things like instruction cache.
7:28:11
beach
But the important part here is that threads don't have to be patched when the global GC is running.
7:31:43
lukego
Thank you for indulging these shoot-from-the-hip questions. It's very interesting work that you are doing.
7:32:03
beach
Another thing, that I told drmeister about this morning was that I can now trace CAR. ...
7:32:37
beach
If I don't inline CAR, and instead put it in the snippet, when someone wants to trace it, the snippets could be altered to do a normal (traced) call.
7:33:57
beach
I was using CAR as an example, because you can't really take advantage of any knowledge of the return value, at least not in most cases.
7:40:48
Colleen
heisig: Bike said 10 hours, 16 minutes ago: i think an actual define-declaration analog would be out of scope for trucler. but trucler could have a function to read implementation-defined info for a user defined declaration, and maybe one to augment
7:40:48
Colleen
heisig: Bike said 5 hours, 43 minutes ago: the other trucler thing i probably need for cleavir to use it is being able to store arbitrary optimize info, like for client dependent qualities... i'll write a PR for that too i guess
7:56:31
heisig
::notify Bike The question is, should Trucler include functions for reading implementation-defined optimize info? Since they are implementation-defined, that particular implementation can simply subclass optimize-description and provide custom accessors.
9:38:26
heisig
Is 32.5 (error handling in standard functions) still up to date? It says that in SICL, standard functions shouldn't call other standard functions for the sake of precise error reporting.
9:44:04
heisig
Heh, also 32.7 (compiler macros) has been superseded by the recent advances in optimizing call sites :)
9:44:39
heisig
I am just pointing this out because it might confuse newcomers. They won't be able to tell which rule of the style guide is still relevant.
9:45:33
heisig
I am thinking of either rewriting them, or deleting them, or marking them as obsolete/work-in-progress.
9:45:42
beach
Now that no-defun-allowed is working on register allocation, I am trying to update the specification.
9:46:34
heisig
That would be wonderful. The specification is a great starting point for new developers.
9:47:22
heisig
So I will just write a new paragraph on designing protocols, and leave the cleanup to you.
16:11:03
Colleen
Bike: heisig said 8 hours, 14 minutes ago: The question is, should Trucler include functions for reading implementation-defined optimize info? Since they are implementation-defined, that particular implementation can simply subclass optimize-description and provide custom accessors.
16:11:42
Bike
i could have cleavir mandate that describe-optimize needs to return a special cleavir subclass of optimie-dscription, i guess
17:40:16
Bike
::notify heisig brief description of what i'm thinking wrt trucler optimize info https://gist.github.com/Bike/daa1bf795c8b718022856ac4cb15175a