libera/#sbcl - IRC Chatlog
Search
20:25:25
Guest2161
I'm at (speed 3) (safety 0) using CFFI, and I'm getting notes that say: doing SAP to pointer coercion (cost 20)
20:52:19
Shinmera
afaiu it needs to allocate it on the heap because pointers don't have a tagged variant
20:53:40
|3b|
64bit values can't in general be passed directly as normal values, since they don't leave any bits for tags. same applies to 64bit ints and double-floats (or replace with 32 for 32-bit lisps)
20:54:04
Shinmera
Guest2161: ? what does copying a struct have to do with the pointer boxing or being slow
20:54:07
|3b|
to use them as values, it has to store them on the heap with some metadata, the pass around a reference to that
20:55:13
|3b|
but within a function, it can try to store them directly in registers, and avoid allocation
20:55:56
Guest2161
or am i always in trouble because just constructing the 64b pointer requires an allocation
20:56:32
|3b|
if you can avoid any (full) function calls passing or returning the value, you might be able to, for example with inlining
20:56:44
Shinmera
making a pointer itself does not require allocation, but storing it or passing it across function boundaries does.
20:57:22
Guest2161
Gotcha, cffi was a mistake i think, i'll see if I can get around it with sb-alien - thanks
20:57:24
|3b|
do you have some code we can look at? (don't paste code here, use some paste site or link to a repo or something)
20:57:50
|3b|
ACTION doesn't remember the details of that stuff in cffi, but might notice something if i look at code
20:58:18
Shinmera
I use CFFI in some very performance critical stuff and it has not been an issue yet
20:58:32
Shinmera
do you *actually know* that this is a problem, and one that is caused by cffi and not something you're doing?
21:00:13
Guest2161
i don't know if it's caused by cffi, but figuring out what cffi is doing is certainly a pain
21:00:34
Guest2161
So i think this might be a problem: i'm storing a common lisp array of foreign structs
21:01:41
Guest2161
do these pointers all need extra 'tags' or whatever, so they end up double indirected & that's where the problem is?
21:02:08
Shinmera
there is no array specialised on SAPs, so yes, the array is upgraded to T, meaning full boxed pointers get stuffed into the array.
21:02:50
Guest2161
& then that's where the note comes from? because it's got to convert from these boxed SAPs to real pointers?
21:03:41
|3b|
might be able to cast them to/from ub64 and store those in a typed array faster, if you really need a bunch of independent pointers
21:04:22
Shinmera
or at that point could also just allocate a foreign array to stuff the pointers into.
21:06:09
Guest2161
It's a higher-level opengl interface, builds up a vertex buffer in lisp, this is the function which stuffs them into foreign memory for opengl to consume
21:06:43
Guest2161
I used a lisp array because :adjustable t is nice, but i guess i'm better off scrapping the lisp array & just allocating it all into foreign memory to start with?
21:07:50
|3b|
either store a big buffer of single-floats if that's all you have, or use nibbles to store floats and whatever else into a ub8 buffer
21:08:25
Shinmera
or usually a single-float buffer honestly since I don't have any non-float types in my GL data.
21:08:36
|3b|
no struct, just some abstraction that pretends a simple-array single-float (*) is a vector of verts
21:09:12
Shinmera
Also, nothing against you, but man the amount of people writing their own low-level GL crap is making me sweat.
21:10:55
Shinmera
https://github.com/shirakumo/trial has two approaches, one being where the high-level data is all in Lisp stuff and then compiled to a single array when needed (like for vertex data), and another that presents a lisp view onto a ub8 array (like for UBOs)
21:12:59
Guest2161
|3b|: How does that perform? i'm seeing a lotta funcalls which I'd assume would be slow but idk with sbcl
21:14:54
|3b|
for AAA games, 99% of verts should be coming from blobs that go straight from disk to gpu anyway :)
21:15:34
Shinmera
I'm gonna be working on an animation engine and gltf importer for trial Very Soon.
21:15:38
Guest2161
nah just 2d shit, but i have a crappy laptop & i hate having hot knees when i play badly optimised games so it's a matter of principle now
21:16:19
Shinmera
That still doesn't mean you shouldn't make shit work first, then measure, and then optimise based on what's worth optimising.
21:16:41
Shinmera
I also develop a lot on a shitty laptop that boils my legs, and so far I've been fine
21:17:03
|3b|
also, these days i'd expect more hot knees from a well optimized game, since those are the ones that will be running all cores
21:17:06
Guest2161
yeah i've got some rectangles bouncing around on screen, it felt surprisingly slow compared to c
21:20:15
|3b|
part of my code that would heat knees is that i regenerate entire geometry for every frame (since it is an "immediate mode" api), and adding some simple caching would probably get rid of that
21:23:00
Shinmera
Alloy is retained but it's still just so much easier to not cache anything in the end
21:23:42
Shinmera
I should *really* cache the data for text draws, but I have yet to figure out an interface that can deal with foreign data without making everything a pain in the ass
21:26:20
Shinmera
Had a nice nvidia driver bug where it would crash everything if you repeatedly resize a large enough buffer
21:26:40
|3b|
yeah, initial version of caching will probably be manual. more like display lists, where i just move a chunk of code into a "build buffer" function, then replace it with a call to drwa that buffer
21:27:33
|3b|
beyond that would probably just rely on gpu->gpu copies being really fast to make a simple GC
21:28:32
|3b|
your UI is probably 90% static (or simple enough to not matter), tile grids are mostly static, etc
21:29:38
Guest2161
i remember trying that once but it wound up slower, i think there was some implicit synchronisation stuff going on, might've just done it badly though
21:29:52
|3b|
well, considering both of us have been too lazy to actually implement it, i don't think we are talking about anything that specific :)
21:36:49
Guest2161
ok i benched it as recommended - turns out it is pretty slow, i can only move about 5k rectangles per frame in lisp vs 200k in c using 'raylib', provides a similar 2d abstraction
21:37:14
Guest2161
must be this double indirection i fumbled my way into, let me try dumping it all into foreign mem
21:39:47
Guest2161
idk why, i think transform feedback is no good on this hunk o' junk, maybe another method works better though
22:18:25
Guest2161
Ok jamming everything in as floats & ignoring the structs is the right call, thanks - much smoother now :)