libera/#sbcl - IRC Chatlog
Search
6:51:54
flip214
sb-vm::show-fragmentation isn't called anywhere, and seems to get dropped by GC in stage 2 or so... with a 2.3.1++ it doesn't exist in the live image
7:07:39
flip214
is there something like a high-water mark that I can set to 0 initially and later on query to find out how much memory got used max inbetween?
7:39:46
lukego_
Relatedly perhaps, yesterday I really wanted TIME to be able to report detailed memory allocation like ROOM does. That is, it's telling me how many bytes were consed, but I wanted a more detailed breakdown e.g. how many _objects_ were consed (because I know that I'm allocating some large specialized vectors, and I don't care about those bytes so much, but I do really care about small objects like boxes for double-floats.)
7:58:31
lukego_
Thanks. Yeah, I was hoping to get a "delta" with the (room t) level of detail e.g. to find out how many double-float objects were allocated. Counting bytes doesn't help much because I'm allocating some huge arrays that I don't consider problematic.
7:59:39
lukego_
I mean a delta in the sense of (time ...) telling me about allocations during execution of that bit of code.
8:00:34
lukego_
I could probably recycle those big arrays on freelists to get them out of the way but then I'm joining the Church of No Consing which isn't my traditional denomonation.
8:35:12
lukego_
the output of ROOM? I don't think that works because it will miss objects that were GC'd during execution. I guess I could inhibit GC though.
8:35:50
lukego_
or maybe this could be implemented with a hook that counts the deltas between each GC and sums them.
8:37:49
lukego_
I'm not spending much time in GC. However my experience from other compilers is that when it comes to boxed arithmetic most of the computational overhead comes outside of GC e.g. allocating and accessing the boxes.
8:40:46
lukego_
*versus. I only learned this month that I have been mixing up verses/versus my whole life. better late than never..
8:40:59
|3b|
maybe try allocation profiler, seems like just knowing you are allocating a lot of boxes doesn't help much if you still don't know where
8:45:41
lukego_
The thing is that I can't see from the totals whether I am allocating a lot of boxes or not. If so I should do a bunch of (speed 3) detailed work. If not I shouldn't bother.
8:45:53
|3b|
ACTION would also be curious to see sprof output, i'm guessing you are spending lots of time in generic math and maybe array bounds checks
8:46:25
lukego_
I'm not actually actively optimizing the code today, just thoughts rattling around from yesterday, but will come back to it soonish
8:48:32
|3b|
then there are things sbcl could theoretically optimize out but i think it doesn't, like not recalculating (/ 2.2222222222222223d0 (expt n-particles 1/3)) and similar more than once per call to line, or that * jitter
8:49:05
|3b|
no idea if that loop is called enough times to matter or not, but it at least looks slow :)
8:50:19
|3b|
declaring types could also help, especially if you can set an upper bound on array size (or recompile things for specific sizes) to allow skipping bounds checks
8:54:20
flip214
getting a high-level overview got easy with perf and a flamegraph (exporting symbols via the sb-perf contrib)
9:07:10
|3b|
would a flame graph tell you all your functions are spending a bunch of time on for example array bounds checks?
9:29:50
lukego_
My mental model is that a boxed allocation for arithmetic will cost ~100x more than an array bounds check. I don't think I'd bother looking into ABC until after I was sure the arithmetic in inner loops was unboxed. Could be off, drawing on LuaJIT experience, but it does seem like modern wide superscalar CPUs aren't too fussed about ABCs that can be speculatively executed through
9:30:19
lukego_
I will be prepared to eat those words on my next optimization session in a week or two though :)
9:37:11
lukego_
aside: I wrote profiler tooling for LuaJIT that is indeed able to tell you if array bounds checks are a hotspot, i.e. is able to associate profiler samples with specific IR instructions, of which ABC is one. it's nice :)
9:37:45
lukego_
I guess you can get a similar effect in SBCL with #'disassemble annotating profiler samples on machine instructions
9:42:46
|3b|
possibly we are just thinking of the same things in different ways and would end up with the same thing... i'd remove generic / untyped math and get minimal boxing, while you remove the boxing and get specialized math :)
9:43:36
|3b|
and yeah, bounds checks are probably small relative to generic math. harder to say compared to specialized math with boxes
9:45:00
|3b|
ACTION was probably also thinking more of generic array access as being a likely problem, now that i think about it more. bounds checks would be a final details thing