freenode/#sicl - IRC Chatlog

19:21:36 Bike I don't know. Whatever the user wants. In this trivia example, I guess it would be the actual optimizer function rather than its name.

19:22:24 Bike actually in the cltl2 protocol you do specify what's returned, right

19:23:17 heisig ACTION re-reads the cltl2 protocol.

19:37:59 heisig OK, I think I get it.

19:38:24 heisig There are three use-cases conflated into cltl2's declaration-information.

19:39:37 heisig The first case is accessing the current optimize settings. Trucler already has a better way of handling that.

19:40:34 heisig The second case is that of accessing the set of user-defined declarations that are currently permitted.

19:41:05 heisig The third case is that of accessing a particular use of one of these user-defined declarations.

19:41:33 heisig My hunch would be to introduce separate API functions for each of these three cases.

19:43:36 heisig The first case stays as it is, with the generic function describe-optimize that returns an optimize-description.

19:44:24 heisig The second case should be handled by a describe-declarations function that returns a declarations-description.

19:45:00 heisig We could then define some generic functions for querying such a declarations description.

19:47:59 Bike Okay. I can amend my PR

19:48:32 heisig The third case could be handled by a generic function called describe-declaration, that returns the declaration specifier whose first entry is that symbol, or NIL if no such declaration specifier exists.

19:48:37 heisig Does that make sense?

19:49:12 heisig (Please tell me if it doesn't make sense. I am still very tired from chairing ELS and likely to make mistakes)

19:50:59 Bike the first two cases make sense. The third... I'm not sure, users might want some processing like cltl2 has, but that's probably out of band for trucler

19:51:59 heisig Am I missing something? Where in cltl2 is that processing?

19:58:17 Bike define-declaration defines a function that th eenvironment is supposed to call when an environment is augmented

19:58:20 Bike like by the compiler

20:01:13 heisig Oh, right, I had missed define-declaration. Give me some time to re-think what I just said.

20:01:41 Bike sure. i mean, i can work on the first two cases to begin with, anyway, that's what i need to use trucler in cleavir

20:19:55 heisig Wow, define-declaration is bizarre.

20:20:39 heisig I think Trucler should include alternatives that are less weird.

21:24:45 Bike ::notify heisig i think an actual define-declaration analog would be out of scope for trucler. but trucler could have a function to read implementation-defined info for a user defined declaration, and maybe one to augment

21:24:45 Colleen Bike: Got it. I'll let heisig know as soon as possible.

1:57:28 Bike ::notify heisig the other trucler thing i probably need for cleavir to use it is being able to store arbitrary optimize info, like for client dependent qualities... i'll write a PR for that too i guess

1:57:28 Colleen Bike: Got it. I'll let heisig know as soon as possible.

3:04:38 beach Good morning everyone!

3:05:15 no-defun-allowed Good morning beach!

5:23:59 no-defun-allowed I can't remember exactly; are floating point addition and multiplication commutative? I know they are not associative, but I can't recall commutativity.

5:24:47 beach Good question. I suppose they are.

5:25:25 beach lukego: Did you see my answer about LLVM?

5:26:31 no-defun-allowed With the existence of NaN, I understand that e.g. 1 + NaN ≠ NaN + 1, but then that reduces to NaN ≠ NaN, so it's not as if anything changed.

5:27:39 jackdaniel heisig: I once started writing a test suite for cltl2 env accessors and to my surprise sbcl implementations also had plenty of issues (afair it was mostly related to querying);

5:27:39 beach I tend to agree.

5:28:02 beach no-defun-allowed: ^

5:28:09 no-defun-allowed Right, thanks.

5:35:02 beach This is what I watched this morning for my daily exercise, and I found it interesting: https://www.youtube.com/watch?v=9-IWMbJXoLM because the speaker was essentially telling the participants of a Linux conference that Unix and C are not so great, and that we should strive for better things.

5:36:53 beach no-defun-allowed: I think he might be Australian, or possibly Kiwi.

5:38:16 no-defun-allowed Quite likely at an Australian Linux conference. I can hear it though.

5:38:49 beach Yeah, though I attended the Australian Linux conference when it was held in Dunedin. :)

5:39:41 no-defun-allowed Holy crap, that's a lot of files to do...something with USB devices which I forgot.

5:40:32 beach Yeah, I didn't understand all that, but I got the gist of it.

5:40:41 beach He is a good speaker I think.

5:41:19 lukego beach: I did now. thanks!

5:41:28 beach Sure.

5:42:50 lukego I've been meaning to look at LLVM one of these days. I also have no desire to interface with it via C++ but could potentially be interested in using its textual IR representation as a target for something.

5:43:07 lukego but I hear you on compilation time and overall complexity and such.

5:43:11 beach Sure, I understand.

5:47:37 no-defun-allowed Also, is there a problem with defining the binary floating-point instructions to be subclasses of BINARY-OPERATION-MIXIN?

5:48:07 beach I believe there is.

5:48:37 beach That mixin was meant to encode the restriction of the x86 that the destination and the first operand are the same.

5:49:03 beach But for floating point, we are using 3-"address" instructions as I recall.

5:49:26 no-defun-allowed That is also the case for SSE floating point instructions - I thought we agreed that the three-address AVX instructions would be too new.

5:49:44 beach Did we?

5:49:48 beach I don't remember that.

5:49:58 beach How new are they?

5:50:45 no-defun-allowed AVX was 2011, AVX2 was 2013. Now I need to double check if the three-address instructions were AVX or AVX2.

5:51:15 beach I think that's old enough that we should use them.

5:51:25 no-defun-allowed All AVX. Okay.

5:57:07 no-defun-allowed I don't have a machine that doesn't have AVX instructions (though, again, some don't have AVX2), and the Steam hardware survey states that 94.77% of computers surveyed support AVX. So I suppose it should be fine.

5:57:55 beach Yes, and that number will increase by the time we are done. :)

5:58:13 no-defun-allowed Sure :)

5:59:58 beach Also, remember that we want to do something simple, just to get an executable system soon-ish. We absolutely have to count on others implementing more things, once that first step is done.

6:04:41 no-defun-allowed Right. However, I've now noticed that we at least have to perform the COMMUTATIVE-MIXIN preprocessing for addition and multiplication, as only the second input can be an immediate* with three-address instructions.

6:05:15 beach Makes sense.

6:05:29 no-defun-allowed *And by "immediate" I mean that it will have to be encoded as a memory input (whatever "m64" is called) which loads some constant value.

6:06:20 no-defun-allowed A similar transform to what I wrote for integer multiplication and division would be done for floating-point subtraction and division.

6:06:28 beach Let's see. All non-trivial constants are already like that.

6:06:45 beach The only immediate inputs you need to deal with are small constant numbers.

6:07:57 no-defun-allowed To be clear, does "small constant numbers" include floating-point numbers?

6:08:14 beach I don't think so.

6:09:47 no-defun-allowed I'll go check then.

6:10:24 beach I am pretty sure.

6:10:40 beach It tests for integers and characters only.

6:10:57 no-defun-allowed Okay, thanks.

6:26:55 splittist Good morning the usual suspects

6:28:06 no-defun-allowed Good morning splittist.

6:36:50 beach Hello splittist.

6:55:11 lukego I'm reading some SICL papers. Really fun! One is so used to reading papers from the 1980s about this stuff in a historical "what they were thinking at the time" context but not current work :)

6:56:07 beach Thanks.

6:56:24 beach Yes, it's different.

6:56:42 lukego I've only had a quick read through the Call-site optimization paper but when it talks about eliminating a memory data load (for accessing the symbol-function) is this at the expense of adding a memory code load (for the heap-allocated snippet object)? and if so might that be a net loss because an OoO CPU can better mitigate data latency than control latency (or can this be predicted in practice?)

6:56:54 beach What I find "amusing" is how much current Common Lisp implementations are based on technology that is no longer the norm.

6:57:01 lukego I'm not sure I've understood though, have to take another read. is this implemented btw?

6:57:53 lukego Sorry maybe it's control latency in bother cases i.e. you are loading the symbol-function in order to branch to it so you can't branch until it's loaded.

6:58:35 lukego I was thinking in LuaJIT terms where the compiler emits a hard-coded branch to the function definition it expects but guards that with a test-and-branch on the symbol-function (so to speak) to detect when this is invalid.

6:58:56 beach Let me digest all that for a while...

6:59:23 lukego yeah sorry I am mixing up mental models. let me try to reframe that :)

7:00:00 lukego though LuaJIT has its own whole bag of tricks so it might not be an awful idea to compare notes a bit anyway.

7:01:29 lukego Let me scratch that whole question for a moment and ask a different one :-)

7:01:43 beach OK, that makes it easier for me. :)

7:03:11 lukego When you redefine a function then couldn't you just recompile all of its callers at the same time? (I guess this doesn't need to be transitive if their own definitions haven't changed - you might have to patch callers-of-callers to the address of the new definition but it should be compatible)

7:03:54 beach lukego: And, it is not implemented. But I am pretty sure it's a win, because the only additional work being done is with the two jumps. And we save at least 1 (in SICL, more like 4) memory loads.

7:05:01 beach You can't recompile a caller from source. It would have to be from a minimally-compiled version of it. But that would take a lot of time because of compiler optimization. And all you would win would be two "free" jumps.

7:06:30 moon-child the jumps are free in principle, but they harm locality

7:06:36 lukego Sure would be nice if function calls didn't require a memory load at all

7:06:49 moon-child important on modern architectures where cache is the categorical imperative

7:06:52 beach moon-child: I can see that, sure.

7:07:01 lukego but again that's me being a LuaJIT hat wearer and wanting to inline everything everywhere.

7:07:33 beach lukego: Function calls in a normal setting must have an indirection so as to allow for late binding.

7:07:44 lukego anyway thanks for the feedback I'll keep reading :)

7:07:55 beach lukego: And you need to load the entry point and the static environment from the function object.

7:08:17 lukego beach: Sort-of, right? I mean you can also do late binding by patching the early-bound code. "become:" in Smalltalk parlance.

7:08:58 beach That is kind of what I am doing. The snippet is technically part of the caller, and it is patched when the callee changes.

7:09:09 lukego Can't help but think that CPU capacity and memory bandwidth have been increasing exponentially while the amount of code allocated on the heap has not. So compared with 20 years ago it must be quite cheap now to say "let's visit every FUNCTION object on the heap and ..."

7:09:27 lukego Yeah, indeed, that's what got me thinking.

7:09:51 beach You could not patch the function object itself...

7:10:12 lukego I need to read the paper again and think about how it relates to inlining.

7:10:13 beach Because according to the callee, you would need to modify the operations of the call sequence.

7:10:34 beach Inlining is strictly more powerful, but defeats late binding.

7:11:02 lukego but can be mitigated, no? LuaJIT inlines literally every function call but with no loss of late binding nor debug information

7:11:34 beach Well, then you would have to patch the caller when the callee changes.

7:12:20 beach So the question then, what if the new callee requires more code to be called than the previous one? Do you move the remaining code? That would end up being very close to recompiling the caller.

7:12:47 lukego Yeah. And LuaJIT doesn't have a good answer for this. Actually *doing* late binding stuff really messes with performance in practice.

7:12:47 moon-child in a jit, you would generate a 'unoptimized' caller which performs an indirect jump to the callee and then an 'optimized' version which inlines the callee. When the callee changes, the optimized version of the caller is invalidated and you fall back to the unoptimized version

7:12:53 beach So by putting the code for the call in the snippet, I avoid such problems.

7:13:23 beach moon-child: I see.

7:13:54 beach How do they handle callers that are already on the stack being executed?

7:14:42 moon-child I don't know

7:14:53 beach Sounds messy.

7:14:59 lukego Hard to think about this stuff abstractly though. In a given application there will be /something/ limiting performance at the CPU level. Instruction fetching? Data fetching? ALU resources? Branch mispredictions? Instruction window space? In every case some optimizations will help and some will hinder.

7:15:27 lukego best solution is probably to have well-defined optimizations that the application programmer can take into account. So kudos :)

7:16:15 moon-child in the linux kernel, every compiled function begins with a NOP sequence, which can be runtime-patched into a direct jump, allowing to upgrade the kernel without rebooting the system. I expect you could do something similar

7:16:23 beach In this case, aside from cache effects as moon-child points out, my technique does strictly less work than the default case. So it is hard for me to see how I can lose.

7:16:42 no-defun-allowed In the case of Self, they reconstruct the stack and registers.

7:17:03 beach That sounds really messy.

7:17:21 beach But that would be a requirement for sane semantics.

7:17:54 no-defun-allowed However, one should note that they don't use optimisations which make it look like your program isn't being run on a bytecode machine. So I suppose that is always possible for them.

7:18:21 lukego beach: I dunno. I feel like "cache effects" and "speculative execution" are the first-order problems and e.g. number of instructions executed is second order. have to worry about what hazards can occur in transferring execution from the caller to the callee. any branch via memory load is a bit scary surely.

7:18:49 no-defun-allowed Rewording: they don't optimize in ways which prevent the creation of an equivalent virtual machine state.

7:18:50 lukego but then this stuff is all way too complicated to work out on irc anyway :)

7:19:06 beach lukego: Branch via memory load is what is traditionally done. My technique avoids that.

7:19:36 lukego I thought you have an extra branch? from caller to trampoline, from trampoline to callee?

7:20:03 beach Yes, but it's an unconditional jump to a fixed, constant, address.

7:20:17 beach Not via a memory load.

7:20:50 lukego oh right both branches are like that, right? okay that does start to sounds quite nice :)

7:21:02 beach Yes, both are like that.

7:21:13 beach moon-child: That NOP trick would not be useful for named calls, because the callee can be redefined arbitrarily.

7:21:22 beach That's what Common Lisp late binding does.