libera/#commonlisp - IRC Chatlog

11:41:39 Lord_of_Life ** NICK Lord_of_L

20:43:39 tam_ ** NICK tam

21:49:12 paulapatience aeth: In your vector DSL, do you walk the forms to find *, +, etc., or are the arithmetic operators bound via macrolet or something?

21:50:51 aeth paulapatience: any *, +, etc., found at compile time is undone in a step, so yes, it walks

21:52:23 aeth rather than optimizing some reduce/fold on &rest, it just explicitly hardcodes a removal of 0, 1, and 2 length into a %low-level, with 2+ reduced to 2 at compilation time. (This is because e.g. - at 0 length is NIL, i.e. an error, while - at 1 length is %negate, i.e. completely different than the binary/dyadic form)

21:52:50 aeth (similarly, + at 0 length is the constant-returning-function %zero, different than the binary/dyadic form)

21:53:10 aeth sort of like how CL implementations often have %false and %true functions as optimizations of (constantly t) and (constantly nil)

21:53:51 aeth paulapatience: these are technically looking at zrvl:+, not cl:+, but since the API is identical, I can make the walker also work on CL:+ for lightly embedded vector stuff that seem more integrated into CL, in a separate sort of macro

21:54:11 aeth But people might not want CL:+ to have extended semantics within a tree-walker

21:55:35 aeth Note that this can't always be done, considering e.g. (f #'+) which then calls #'+ inside of F. So I will have to implement an alternative, "proper" implementation in such cases, which I will do once I add higher order functions back in.

21:56:33 aeth Well, I aggressively, aggressively, _aggressively_ inline and I will do so on higher order functions, considering the goal is efficient vector math. So (f #'+) might be handled, but you can just obfuscate that a few levels to get a place where a properly generic + is needed.

21:59:40 aeth The three goals, and why it's a proper DSL, are (1) I want to just write things like (zrvl:+ a b) instead of having to think about the 5 different, more efficient ways of doing that (not heap-allocating a and b if possible) with a bunch of fallbacks. And (2) I want to be able to use higher order functions complete with currying, etc., where convenient without the typical performance losses. And (3) it

21:59:46 aeth also needs to compile to SPIR-V. So the whole thing _has_ to tree walk and only permit a subset of CL inside of the tree-walker for simplicity of implementation. Especially for #3.

22:08:41 paulapatience aeth: I see. I'm defining a mini-DSL for something, and I wanted to shadow cl:* and cl:+ because I found the package prefix to be too long, but then I might need to use cl:* and cl:+ in the subforms in the macro.

22:09:09 paulapatience I was wondering whether to use :* and :+ instead, or to give an escape hatch to restore * and + to the usual meanings.

22:09:30 aeth well, that's why I picked the letter combination "ZR" for my game engine > 10 years ago. So I can say, e.g. "ZRVL" (Zombie Raptor Vector Language) and know that nobody in the world is going to name clash it

22:09:36 aeth so e.g. zrvl:+

22:09:55 aeth But I can also, separately, turn CL:+ into ZRVL:+ in a separate macro if I choose to. One downside being not everything is in CL

22:10:15 aeth For instance, I'm taking a few things from common libraries, with near-identical or identical APIs.

22:10:27 aeth e.g. FLOAT-FEATURES:FLOAT-NAN-P

22:11:29 aeth One advantage with this is that for the scalar version, I may be able to implement it with those functions. Because I am going to have three backends, scalar, non-portable (in implementation _or_ architecture) SB-SIMD, and SPIR-V

22:27:33 paulapatience aeth: Are you going to implement operator fusing?

22:30:01 aeth like fma?

22:31:01 aeth You can't do that... with floats, you have to assume that people know what they're doing and changing things around with identities assuming floats are reals (they're not) can greatly change results, especially on single floats.

22:31:24 paulapatience A higher level than that. I guess the question is, are you mapping the DSL operations directly to the underlying architecture, or are you doing some higher level optimization of the forms provided to the macro?

22:33:48 aeth What I _could_ do is do arithmetic identities on _two_ provided forms.

22:34:32 aeth It might be able to prove that f and g are equivalent mathematical functions, and thus the straightforward f and the fast g are the same (when you're modeling reals) despite g being better for floating point error while f is better for readability

22:34:54 aeth the kinds of things you can consider when you have to tree-walk a subset, anyway

22:38:13 aeth e.g. the LERP of (+ a (* t (- a b)) vs (+ (* (- 1.0 t) a) (* t b))

22:39:20 aeth there's one intermediate step (+ a (* t (- a)) (* t b)) to see that it's equivalent (and, yes, you can't use t in CL because of CL:T for true... e.g. alexandria:lerp uses v)

23:04:06 paulapatience aeth: I was thinking more of like omitting intermediate vectors in some operations, to reduce consing

23:06:17 paulapatience I guess "loop fusion" would be a better term

23:06:45 aeth paulapatience: for simd-pack in SB-SIMD, any intermediate heap consing for simd-pack vectors that stay within the scope of the function _should_ be reported to #sbcl as bugs (and I even noticed one once)

23:07:20 aeth For the CL scalar fallback, declare dynamic-extent can be used, but that probably only needs to be used for the longer stuff like matrices.

23:08:12 aeth Vectors don't have to actually exist at all because they can be converted to 2-4 (values) and handled with multiple-value-bind and multiple-value-call, which is one huge reason why a tree-walking transformer is easier than using it directly (m-v-c really makes things increadibly verbose, while m-v-b doesn't allow more than one binding, unlike let and let*, so it ruins the indentation levels)

23:09:40 aeth For looping, I will have to use my own loop macro that _only_ becomes for loops because CL:LOOP is frequently while loops and is quite advanced, CL:DO is kind of a while loop, and CL:DOTIMES isn't powerful enough

23:10:19 aeth Not just for optimizations, also for easier targeting of SIMD without having to generate GOtos and then turn those GOtos into structured programming, which seems to be a common IR request these days, whether SPIR-V or WASM

23:10:27 aeth s/targeting of SIMD/targeting of SPIR-V/

23:12:27 paulapatience Ah, of course, your vectors will be small, since you're writing this for your game engine.

23:12:52 paulapatience I was thinking about arbitrarily sized vectors.

23:13:18 aeth Right, it's graphics vectors, which become SPIR-V or SIMD vectors (or not; portable fallback is key). Any higher level vectors have to be constructed from lower level vectors, or from "arrays"

23:13:35 aeth Technically speaking, they're vecs (common to avoid the name collision), while vectors (1D arrays) are a separate data type

23:14:03 aeth Or I suppose in this sense they're %VECs

23:14:47 aeth It's possible that the higher level interface will permit things like complex vectors (a real and a complex %VEC paired together?) or longer vectors

23:15:13 aeth which would, again, for clarity be called VECs (but not %VECs)

23:16:11 aeth Similarly, matrices are strictly 2x2 through 4x4 in size and anything larger has to be made up of block matrices with blocks only up to 4x4. At least for %MATRIX, not necessarily for MATRIX

23:16:52 aeth Permitting smaller (e.g. 1x4 and 4x1) is a separate issue, especially since they'd overlap with the (column) vectors

23:18:31 aeth paulapatience: Another reason for tree-walking is creating types that don't exist at the CL level, but can be used for correctness. Since with DEFTYPE, a type defined for (simple-array single-float (3)) and another type defined for (simple-array single-float (3)) are the same, and if you wanted metadata, you'd have to put them in separate structs and have that extra cost at runtime. Similarly, the matrix

23:18:37 aeth representation could vary between 1D row-major, 1D column-major, 2D row-major, 2D column-major, etc., without having different user-exposed code and just do that at the tree-walking transformation level