libera/#sbcl - IRC Chatlog

3:08:20 pf3 hello, i'm engaged in a casual optimization golf. i've made the following (non-portable, and otherwise bad) code http://okturing.com/src/14959/body not cons, except for the last multiply, which insists on allocating a bignum. is there any way to trick python here to do a uint64*uint64->uint64 multiply?

3:09:54 hayley I think (ldb (byte 64 0) (* ...)) should hint that you only care about the low 64 bits. But you will likely need to inline XORSHIFT1024, so that the return value may be passed as an unboxed value back to the caller.

3:12:20 pf3 aah, that's what it is, the return value. now i understand, and the note i saw at some point while trying to muck it also makes sense. thank you

3:27:59 pf3 oh, is this because 64 bit value doesn't have space for tag on a 64 bit system, so it has to be boxed?

3:30:24 hayley Right. If the return value exceeds the range of a fixnum, SBCL needs to allocate a bignum to store the value.

3:32:10 pf3 right, dropping the number of bits make it work as is. now it makes sense, in my mind (usigned-byte 64) was immediate, but that's obviously wrong.

5:49:46 moon-child has sbcl any boundschecking elision capabilities?

5:50:25 moon-child st something to the effect of (progn (assert (< 0 i (+ i 10) (length array))) bunch of stuff with (aref array (+ i something between 0 and 10)))

5:50:30 moon-child would work

5:51:18 hayley I don't recall that being elided, as there is similar done manually in one-more-re-nightmare.

5:56:29 mfiano moon-child: You can disable bounds checking for a particular body of code if that is what you are asking

5:58:04 hayley The issue is if the compiler can automatically prove that bounds checks aren't necessary, and remove them itself.

6:00:26 mfiano I wouldn't want that. I went through hell in Julia because of that. My CPU's FPU resources were bottlenecking out because of some bounds checks the compiler thought would be faster to remove.

6:01:17 mfiano order of magnitude difference with bounds checking on for 1 particular array access :)

6:02:13 moon-child what?

6:02:47 mfiano I used llvm-mca to figure out why forcing an array to bounds check was MUCH faster than the compiler eliding the check for me.

6:02:58 moon-child it was faster to have some bounds check than to not have it? Sounds like a separate issue which that simply revealed

6:02:59 mfiano Nice tool, if only an approximation

6:03:34 mfiano Yes. The reasoning was explained by llvm-mca's results

6:04:14 mfiano Nah this is from the raw assembly being measured using my CPU as a model

6:04:23 mfiano using llvm-mca

6:04:52 moon-child it is strictly less work to perform a bounds check than to not. At _most_, I would perhaps expect a few % difference from scheduling if you get unlucky. And I would not expect llvm-mca to catch that

6:04:57 mfiano It was very short code on several occurences being faster to remove the elisison of one or a couple bounds checks

6:05:22 moon-child https://mastodon.social/@pervognsen/109618222504547584 cf

6:05:26 mfiano Thing is, if one resource is topped out, you suffer, and my CPU has a ton of resources just for FPU alone

6:05:59 moon-child if you are unlucky, you can get screwed over by scheduling, yes. That is not at all, not even a little bit, an argument that we should not elide bounds checks

6:06:06 mfiano Good code will try to balance a particular piece of hardware's resources, not just assume it will run optimally by making the code optimal

11:32:32 scymtym i'm pretty sure SBCL's constraint system can propagate /some/ information regarding indices always being within array bounds. see ARRAY-IN-BOUNDS-P in compiler/constraint.lisp