libera/#commonlisp - IRC Chatlog
Search
18:00:38
mfiano
It's pretty amazing how that feature caught like wildfire in the world of opinionated Lispers.
18:32:07
Bike
phantomics: sbcl optimizes division by small constants into shifts+etc already https://github.com/sbcl/sbcl/blob/master/src/compiler/srctran.lisp#L3093-L3119. if you want to do something like libdivide you could call compile? like use (compile nil `(lambda (x) (truncate x ,d))) where libdivide has libdivide::divider<int>;
18:32:29
Bike
if the function call overhead is too much, you could probably leverage sbcl internals to produce appropriate forms yourself and rely on the modular arithmetic optimizations
19:13:31
phantomics
By small constants does it have to be expressed as a constant in the code, like (floor x 3)? Will it work with for example (floor x (aref v y)), where v is a vector of positive integers?
0:43:55
Bike
phantomics: it will have to be a constant, which is why i suggested the compile thing for a case like libdivide's, where it's not a compile time constant but you're using it for the same division a lot where you want to amortize
0:44:44
Bike
converting to shifts and multiplications and all is probably slower than using hardware division for cases where the dividend varies each time, of course
0:46:09
Bike
"probably" shoudl be definitely actually, because generating new machine instructions and running them is absolutely going to be slower than one ALU operation
0:46:46
phantomics
I'm dividing by dimensional factors, so I may consistently have a need for numbers 0-10 but the factors for high dimensions could be anything
0:47:25
phantomics
I guess I could make a set of functions for low numbers and reference them from a LUT but I couldn't create a function in memory for just any number
0:54:26
Bike
you should also profile to see if this is actually a problem, of course. i just did a quick test to compare truncate with a constant dividend versus variable where in either case they're declared fixnum and there's optimize speed. with a million iterations the constant version is 3 ms or ~22% faster, which given that it only took 16 ms in the variable version isn't too amazing
0:54:50
Bike
with a hundred milion iterations it's ~48%. that's the kind of size you're gonna need for this to matter at all
0:56:27
phantomics
My use cases could get pretty big, I'm converting row-major indices between arrays
0:57:10
phantomics
So that means that I have to do a number of integer divisions equal to the rank of the array to convert its row-major index to integer coordinates, and then usually some kind of multiplication of each of those
0:58:11
phantomics
So for large arrays with many steps of index conversion, millions of iterations are not unlikely
0:58:33
Bike
oh, and in my example there's function call overhead in both cases, but in reality you wouldn't need it for the variable truncate
0:58:52
phantomics
The benefit of doing row-major index conversion for array processing is that it allows the transformations to be multithreaded with nothing shared, so that can give a significant speedup
1:03:33
phantomics
Appreciate the examples Bike, I'll try running something like that and see how it profiles
1:04:57
Bike
oh, and also i think sbcl can use range information about the dividend in its optimization, so if you can provide that information for your indices it could help
1:10:08
phantomics
Right, the CPU will make a difference here, M1s apparently have very fast integer division, so I may want to try with Stas's SBCL port
1:11:25
Bike
you could even try to do something fancy where your thing tests which implementation is faster on the actual machine and then uses that
1:13:19
phantomics
That's something I've been considering, an autotuning system, it would be helpful to determine the cutoff point for when to add a new thread to a given transformation on an array of a given size
1:15:33
Bike
cl-jpeg tries to do this, but it's a little janky https://github.com/sharplispers/cl-jpeg/blob/master/jpeg.lisp#L312-L361
1:18:38
Bike
yeah, which is just a boolean. then the quantize-block macro decides what to expand into based on that variable.
1:26:13
phantomics
Pretty simple, I was envisioning a big table of values for each operation. Though with my loop-fusion system such a technique may only matter for single-stage transformations of big arrays
1:46:03
lagash
I'm looking into coding some sort of parsing / generating library for LDraw (standard file format for LEGO designs) and I'm not entirely sure where to start. I've done stuff with PEGs and other parsing grammars before, but in other languages. What would be something I should look into for bidirectional LDraw<->Sexp?
1:53:31
phantomics
lagash: You could take a look at https://github.com/eugeneia/maxpc for building parsers, be advised it's AGPL-licensed though
1:59:19
lagash
I'm not too enthused about it not implementing packrat parsing as well as being AGPL'd
2:05:53
phantomics
As far as writing back to LDraw it looks like any option would involve significant work, I see the format is unique and not based on XML or anything like that
2:07:07
phantomics
I see there's https://github.com/JoshTheDerf/LDRParser that can convert LSD -to- JSON, but converting from JSON to LDR is still a todo, if it was implemented you could generate JSON from CL to give to it
2:12:13
lagash
And I was meaning more, that the parser and the writer were integrated together, versus having two libraries, one for parsing, the other writing / generating
2:47:50
phantomics
That would be a good design choice for your library, making it integrated like that, but I don't know of any tool that makes that particularly easy, I expect you'd need to built it from scratch with a bunch of (format) calls