freenode/#sicl - IRC Chatlog
Search
19:21:36
Bike
I don't know. Whatever the user wants. In this trivia example, I guess it would be the actual optimizer function rather than its name.
19:39:37
heisig
The first case is accessing the current optimize settings. Trucler already has a better way of handling that.
19:40:34
heisig
The second case is that of accessing the set of user-defined declarations that are currently permitted.
19:41:05
heisig
The third case is that of accessing a particular use of one of these user-defined declarations.
19:41:33
heisig
My hunch would be to introduce separate API functions for each of these three cases.
19:43:36
heisig
The first case stays as it is, with the generic function describe-optimize that returns an optimize-description.
19:44:24
heisig
The second case should be handled by a describe-declarations function that returns a declarations-description.
19:45:00
heisig
We could then define some generic functions for querying such a declarations description.
19:48:32
heisig
The third case could be handled by a generic function called describe-declaration, that returns the declaration specifier whose first entry is that symbol, or NIL if no such declaration specifier exists.
19:49:12
heisig
(Please tell me if it doesn't make sense. I am still very tired from chairing ELS and likely to make mistakes)
19:50:59
Bike
the first two cases make sense. The third... I'm not sure, users might want some processing like cltl2 has, but that's probably out of band for trucler
19:58:17
Bike
define-declaration defines a function that th eenvironment is supposed to call when an environment is augmented
20:01:13
heisig
Oh, right, I had missed define-declaration. Give me some time to re-think what I just said.
20:01:41
Bike
sure. i mean, i can work on the first two cases to begin with, anyway, that's what i need to use trucler in cleavir
21:24:45
Bike
::notify heisig i think an actual define-declaration analog would be out of scope for trucler. but trucler could have a function to read implementation-defined info for a user defined declaration, and maybe one to augment
1:57:28
Bike
::notify heisig the other trucler thing i probably need for cleavir to use it is being able to store arbitrary optimize info, like for client dependent qualities... i'll write a PR for that too i guess
5:23:59
no-defun-allowed
I can't remember exactly; are floating point addition and multiplication commutative? I know they are not associative, but I can't recall commutativity.
5:26:31
no-defun-allowed
With the existence of NaN, I understand that e.g. 1 + NaN ≠ NaN + 1, but then that reduces to NaN ≠ NaN, so it's not as if anything changed.
5:27:39
jackdaniel
heisig: I once started writing a test suite for cltl2 env accessors and to my surprise sbcl implementations also had plenty of issues (afair it was mostly related to querying);
5:35:02
beach
This is what I watched this morning for my daily exercise, and I found it interesting: https://www.youtube.com/watch?v=9-IWMbJXoLM because the speaker was essentially telling the participants of a Linux conference that Unix and C are not so great, and that we should strive for better things.
5:38:49
beach
Yeah, though I attended the Australian Linux conference when it was held in Dunedin. :)
5:39:41
no-defun-allowed
Holy crap, that's a lot of files to do...something with USB devices which I forgot.
5:42:50
lukego
I've been meaning to look at LLVM one of these days. I also have no desire to interface with it via C++ but could potentially be interested in using its textual IR representation as a target for something.
5:47:37
no-defun-allowed
Also, is there a problem with defining the binary floating-point instructions to be subclasses of BINARY-OPERATION-MIXIN?
5:48:37
beach
That mixin was meant to encode the restriction of the x86 that the destination and the first operand are the same.
5:49:26
no-defun-allowed
That is also the case for SSE floating point instructions - I thought we agreed that the three-address AVX instructions would be too new.
5:50:45
no-defun-allowed
AVX was 2011, AVX2 was 2013. Now I need to double check if the three-address instructions were AVX or AVX2.
5:57:07
no-defun-allowed
I don't have a machine that doesn't have AVX instructions (though, again, some don't have AVX2), and the Steam hardware survey states that 94.77% of computers surveyed support AVX. So I suppose it should be fine.
5:59:58
beach
Also, remember that we want to do something simple, just to get an executable system soon-ish. We absolutely have to count on others implementing more things, once that first step is done.
6:04:41
no-defun-allowed
Right. However, I've now noticed that we at least have to perform the COMMUTATIVE-MIXIN preprocessing for addition and multiplication, as only the second input can be an immediate* with three-address instructions.
6:05:29
no-defun-allowed
*And by "immediate" I mean that it will have to be encoded as a memory input (whatever "m64" is called) which loads some constant value.
6:06:20
no-defun-allowed
A similar transform to what I wrote for integer multiplication and division would be done for floating-point subtraction and division.
6:55:11
lukego
I'm reading some SICL papers. Really fun! One is so used to reading papers from the 1980s about this stuff in a historical "what they were thinking at the time" context but not current work :)
6:56:42
lukego
I've only had a quick read through the Call-site optimization paper but when it talks about eliminating a memory data load (for accessing the symbol-function) is this at the expense of adding a memory code load (for the heap-allocated snippet object)? and if so might that be a net loss because an OoO CPU can better mitigate data latency than control latency (or can this be predicted in practice?)
6:56:54
beach
What I find "amusing" is how much current Common Lisp implementations are based on technology that is no longer the norm.
6:57:01
lukego
I'm not sure I've understood though, have to take another read. is this implemented btw?
6:57:53
lukego
Sorry maybe it's control latency in bother cases i.e. you are loading the symbol-function in order to branch to it so you can't branch until it's loaded.
6:58:35
lukego
I was thinking in LuaJIT terms where the compiler emits a hard-coded branch to the function definition it expects but guards that with a test-and-branch on the symbol-function (so to speak) to detect when this is invalid.
7:00:00
lukego
though LuaJIT has its own whole bag of tricks so it might not be an awful idea to compare notes a bit anyway.
7:03:11
lukego
When you redefine a function then couldn't you just recompile all of its callers at the same time? (I guess this doesn't need to be transitive if their own definitions haven't changed - you might have to patch callers-of-callers to the address of the new definition but it should be compatible)
7:03:54
beach
lukego: And, it is not implemented. But I am pretty sure it's a win, because the only additional work being done is with the two jumps. And we save at least 1 (in SICL, more like 4) memory loads.
7:05:01
beach
You can't recompile a caller from source. It would have to be from a minimally-compiled version of it. But that would take a lot of time because of compiler optimization. And all you would win would be two "free" jumps.
7:07:01
lukego
but again that's me being a LuaJIT hat wearer and wanting to inline everything everywhere.
7:07:33
beach
lukego: Function calls in a normal setting must have an indirection so as to allow for late binding.
7:07:55
beach
lukego: And you need to load the entry point and the static environment from the function object.
7:08:17
lukego
beach: Sort-of, right? I mean you can also do late binding by patching the early-bound code. "become:" in Smalltalk parlance.
7:08:58
beach
That is kind of what I am doing. The snippet is technically part of the caller, and it is patched when the callee changes.
7:09:09
lukego
Can't help but think that CPU capacity and memory bandwidth have been increasing exponentially while the amount of code allocated on the heap has not. So compared with 20 years ago it must be quite cheap now to say "let's visit every FUNCTION object on the heap and ..."
7:10:13
beach
Because according to the callee, you would need to modify the operations of the call sequence.
7:11:02
lukego
but can be mitigated, no? LuaJIT inlines literally every function call but with no loss of late binding nor debug information
7:12:20
beach
So the question then, what if the new callee requires more code to be called than the previous one? Do you move the remaining code? That would end up being very close to recompiling the caller.
7:12:47
lukego
Yeah. And LuaJIT doesn't have a good answer for this. Actually *doing* late binding stuff really messes with performance in practice.
7:12:47
moon-child
in a jit, you would generate a 'unoptimized' caller which performs an indirect jump to the callee and then an 'optimized' version which inlines the callee. When the callee changes, the optimized version of the caller is invalidated and you fall back to the unoptimized version
7:14:59
lukego
Hard to think about this stuff abstractly though. In a given application there will be /something/ limiting performance at the CPU level. Instruction fetching? Data fetching? ALU resources? Branch mispredictions? Instruction window space? In every case some optimizations will help and some will hinder.
7:15:27
lukego
best solution is probably to have well-defined optimizations that the application programmer can take into account. So kudos :)
7:16:15
moon-child
in the linux kernel, every compiled function begins with a NOP sequence, which can be runtime-patched into a direct jump, allowing to upgrade the kernel without rebooting the system. I expect you could do something similar
7:16:23
beach
In this case, aside from cache effects as moon-child points out, my technique does strictly less work than the default case. So it is hard for me to see how I can lose.
7:17:54
no-defun-allowed
However, one should note that they don't use optimisations which make it look like your program isn't being run on a bytecode machine. So I suppose that is always possible for them.
7:18:21
lukego
beach: I dunno. I feel like "cache effects" and "speculative execution" are the first-order problems and e.g. number of instructions executed is second order. have to worry about what hazards can occur in transferring execution from the caller to the callee. any branch via memory load is a bit scary surely.
7:18:49
no-defun-allowed
Rewording: they don't optimize in ways which prevent the creation of an equivalent virtual machine state.
7:19:06
beach
lukego: Branch via memory load is what is traditionally done. My technique avoids that.
7:19:36
lukego
I thought you have an extra branch? from caller to trampoline, from trampoline to callee?
7:20:50
lukego
oh right both branches are like that, right? okay that does start to sounds quite nice :)
7:21:13
beach
moon-child: That NOP trick would not be useful for named calls, because the callee can be redefined arbitrarily.