freenode/#shirakumo - IRC Chatlog
Search
20:42:56
mfiano
Currently the decoding process is about 75% of the time. Only 25% for decompresion now.
20:44:51
mfiano
I'd actually like to discuss it with someone smarter with me if you ever get time :)
20:49:09
Shinmera
I'm typically not an optimisation guru, so I'm not sure how much help I'd be in that.
20:50:18
mfiano
The way decoding works, is it looks at each row of pixels in order. For each pixel, depending on the filter method for that row, looks at either the left pixel, the top pixel, or both, in order to change the current pixel (this is called unfiltering in the spec). So we have to iterate each row in order which is slow. I'd like to parallelize this process with a thread pool. But that involves batching
20:50:21
mfiano
scanlines in groups of two (so it has the previous row available for when it has to look at the top pixel).
20:53:22
Shinmera
You could divide the image into quadratic batches and kick off threads in a diagonal fashion as dependencies become fulfilled.
20:54:38
Shinmera
Well, as you describe it each pixel has a dependency on the one above and the one to the left, meaning as the dependencies unravel the "frontier" moves diagonally across the image.
20:55:10
Shinmera
Since using a task for each pixel is too expensive and fine grained, you'd instead divide the image into batches to compute, and have the same dependency scheme across the batches instead.
21:03:01
Shinmera
That's pretty good. I presume you already looked at sprof output and everything, yeah?
21:03:50
mfiano
It shows about 75% of the time spent doing this unfiltering, and that's with heavy compiler optimizations me and 3b added over the years
21:07:31
mfiano
Depends on how fine grained you sub divide. Small 32x32 images parsing in parallel have significant gains, so I don't think this is too worrisome
22:29:03
mfiano
I should be able to look at the filter method of all rows first, and then process the ones with the None or Sub filters first in parallel (the ones where pixels are unmodified or depend on the left pixel). Then the sparse rows can be processed diagonally
23:34:13
|3b|
Shinmera: yeah, map-glyphs was as a minimal "get something working" api, and real layout should probably be a separate lib. no idea what APIs would be useful for that beyond just the raw data without writing it though :)
23:34:55
|3b|
mfiano: threading pngload doesn't seem too useful to me, i'd probably just do SSE stuff and thread at whole-image level
6:09:43
Shinmera
|3b|: Pretty sure it'd just need fast access to glyph, kerning, and ligature lookup.
6:13:03
|3b|
yeah, i guess it sort of depends on whether there are going to be more font metadata libs or more text layout libs when trying to decide where things belong from perspective of avoiding code duplication :p
6:13:40
Shinmera
I'm assuming the former since a glyph renderer is likely going to need additional info as well.
6:15:21
|3b|
seems like at this point, mostly just "some simple bitmap font atlas" and "some variant of ttf"
6:16:37
Shinmera
My point is more that a glyph renderer is also gonna be parsing the metrics to do a demo, just like you did.
6:17:20
Shinmera
In our specific case we're even loading from a cached file we ultimately have no control over.
6:17:25
|3b|
unrelated, trying to figure out if things look right, when 'right' looks wrong, is hard :p
6:20:30
|3b|
so the lines are supposed to be rectangular in screen space, which doesn't match the world-space 3d transforms
6:20:40
Shinmera
Well, I suppose here's where one of those problems comes in, namely that now your lines are "3D" rather than always facing the camera and thus completely flat.