freenode/#shirakumo - IRC Chatlog

20:40:38 Shinmera Nice

20:40:44 Shinmera So his lib is in QL now?

20:41:25 mfiano Not yet, but will be by the time pngload changes are

20:41:35 mfiano (request for addition submitted)

20:41:59 Shinmera Okey, cool

20:42:23 mfiano I have some ideas for making it much faster that I'm experimenting with

20:42:56 mfiano Currently the decoding process is about 75% of the time. Only 25% for decompresion now.

20:44:51 mfiano I'd actually like to discuss it with someone smarter with me if you ever get time :)

20:49:09 Shinmera I'm typically not an optimisation guru, so I'm not sure how much help I'd be in that.

20:49:23 mfiano Well it's not about optimization really

20:50:18 mfiano The way decoding works, is it looks at each row of pixels in order. For each pixel, depending on the filter method for that row, looks at either the left pixel, the top pixel, or both, in order to change the current pixel (this is called unfiltering in the spec). So we have to iterate each row in order which is slow. I'd like to parallelize this process with a thread pool. But that involves batching

20:50:21 mfiano scanlines in groups of two (so it has the previous row available for when it has to look at the top pixel).

20:50:34 mfiano That's the hard part

20:53:22 Shinmera You could divide the image into quadratic batches and kick off threads in a diagonal fashion as dependencies become fulfilled.

20:53:51 mfiano That sounds...above my pay grade

20:54:00 mfiano I would have to understand what you mean :)

20:54:38 Shinmera Well, as you describe it each pixel has a dependency on the one above and the one to the left, meaning as the dependencies unravel the "frontier" moves diagonally across the image.

20:55:10 Shinmera Since using a task for each pixel is too expensive and fine grained, you'd instead divide the image into batches to compute, and have the same dependency scheme across the batches instead.

20:55:31 mfiano Hmm

20:59:51 Shinmera How competitive is the single-thread performance compared to C stuff?

21:00:04 Shinmera Because I don't think libpng etc. are threaded.

21:00:43 mfiano ~0.08s vs ~0.25s for a 3840x2160 32bit image

21:01:40 mfiano actually, here: https://gist.github.com/mfiano/8edc6ef24037a973a25ffb2f5597ef4d

21:01:41 Colleen gist.github.com/mfiano/8edc... Website (HTML), Title: pngload.lisp · GitHub

21:02:27 Shinmera So a factor 3.

21:03:01 Shinmera That's pretty good. I presume you already looked at sprof output and everything, yeah?

21:03:17 mfiano Yes.

21:03:50 mfiano It shows about 75% of the time spent doing this unfiltering, and that's with heavy compiler optimizations me and 3b added over the years

21:04:18 Shinmera Okey.

21:05:11 Shinmera Instead of threads, might also be possible to do SIMD stuff to get that speed.

21:05:27 Shinmera As in, SSE

21:05:44 mfiano Yeah, I don't know much about vectorization ops at all.

21:05:52 mfiano THat'd have to be done by someone more qualified

21:06:11 Shinmera I'm a bit wary about threads because they're so heavy is all.

21:06:24 Shinmera You might very well just waste all the gains (or worse) to set things up.

21:07:31 mfiano Depends on how fine grained you sub divide. Small 32x32 images parsing in parallel have significant gains, so I don't think this is too worrisome

22:29:03 mfiano I should be able to look at the filter method of all rows first, and then process the ones with the None or Sub filters first in parallel (the ones where pixels are unmodified or depend on the left pixel). Then the sparse rows can be processed diagonally

23:34:13 |3b| Shinmera: yeah, map-glyphs was as a minimal "get something working" api, and real layout should probably be a separate lib. no idea what APIs would be useful for that beyond just the raw data without writing it though :)

23:34:55 |3b| mfiano: threading pngload doesn't seem too useful to me, i'd probably just do SSE stuff and thread at whole-image level

6:09:43 Shinmera |3b|: Pretty sure it'd just need fast access to glyph, kerning, and ligature lookup.

6:13:03 |3b| yeah, i guess it sort of depends on whether there are going to be more font metadata libs or more text layout libs when trying to decide where things belong from perspective of avoiding code duplication :p

6:13:40 Shinmera I'm assuming the former since a glyph renderer is likely going to need additional info as well.

6:13:54 Shinmera As in, a glyph renderer can't get around parsing font files.

6:14:21 |3b| hmm, could be

6:14:54 |3b| though how many font formats will people care about?

6:15:19 Shinmera What point are you trying to make?

6:15:21 |3b| seems like at this point, mostly just "some simple bitmap font atlas" and "some variant of ttf"

6:15:42 |3b| there might be lots of glyph renderers sharing 2 or 3 font parsing libs

6:15:57 |3b| ACTION doesn't really have that much of a point though :)

6:16:10 Shinmera Well, everyone's just gonna be using zbp-ttf anyway

6:16:37 Shinmera My point is more that a glyph renderer is also gonna be parsing the metrics to do a demo, just like you did.

6:17:20 Shinmera In our specific case we're even loading from a cached file we ultimately have no control over.

6:17:25 |3b| unrelated, trying to figure out if things look right, when 'right' looks wrong, is hard :p

6:18:54 |3b| http://3bb.cc/tmp/3b-glim2.png

6:18:55 Colleen 3bb.cc/tmp/3b-glim2.png Image (PNG)

6:19:34 Shinmera Hmmm, indeed

6:19:54 |3b| same cube geometry as before but with lines primitive, all moving around and spinning

6:20:30 |3b| so the lines are supposed to be rectangular in screen space, which doesn't match the world-space 3d transforms

6:20:40 Shinmera Well, I suppose here's where one of those problems comes in, namely that now your lines are "3D" rather than always facing the camera and thus completely flat.

6:21:02 Shinmera Right, you also get them clipping

6:21:28 |3b| they stay facing the camera

6:22:23 Shinmera Ah- I meant in the sense that they would be projected flat towards the camera

6:23:01 Shinmera I don't feel very eloquent at the moment :)

6:23:19 |3b| ACTION wonders if i have any working video recorder things set up

6:23:42 Shinmera OBS is pretty easy to get set up

6:24:28 |3b| yeah, that's one of the options, seems to be working

6:32:37 |3b| https://youtu.be/UAbLbnLSvNI

6:32:37 Colleen www.youtube.com/watch?v=UAb... Website (HTML), Title: 3b-glim testing - YouTube

6:35:21 |3b| stops animating at end because i tried to switch to tri strips and it crashed :p

6:35:29 |3b| needs more work i guess :)

6:37:08 |3b| triangle fan seems to work but is incredibly slow for some reason

6:37:23 Shinmera Huh.