freenode/#sicl - IRC Chatlog

5:10:11 |3b| card that does 10TFLOPS single does 350GFLOPS double, 180G half

5:11:02 beach That should be plenty to produce a symphony orchestra using additive synthesis.

5:11:38 |3b| yeah, if latency isn't an issue, you can probably do a lot of sound processing on a GPU :)

5:12:13 beach Latency is an issue for real-time sound production. Much more so than for video.

5:12:40 beach Our ears are extremely sensitive to delay and also to small errors in computation.

5:12:41 |3b| right, which is why i mention it :)

5:12:53 beach Yes, thanks. Good to know.

5:13:22 beach I am not planning to use this information any time soon, but it changes how I think about some of my very low-priority projects.

5:14:34 |3b| GPU are optimized for working on large chunks of data at once, so you tend to want to work on longer segments, and you also have latency for transferring to/from GPU memory, and for going through the API

5:14:42 beach Another thing that has changed the game is multi-core processors. Producing sound is a highly parallel procedure.

5:15:11 |3b| yeah, CPU are pretty decent at that sort of thing too, especially if you can use their SIMD features

5:15:27 beach Exactly.

5:18:22 |3b| (and just to be clear, when i mentioned latency i mean from starting processing to getting results, so delay when processing a live stream or playing live... once you start though, you should be able to produce a continuous stream without gaps assuming it runs in realtime to start with)

5:18:50 beach I think I understand.

5:20:36 |3b| so good for offline generation/processing of sound, or realtime generation/processing if an initial delay is OK

5:21:00 beach Got it.

5:21:16 |3b| for other cases, might be OK, but i'd suggest to do some tests before investing a bunch of effort into it :)

5:22:38 |3b| (graphics, particularly VR, is doing latencies in the 10s of ms range or less, but doesn't have to copy back to CPU to send to another API for output, or at worst goes through some optimiized path in drivers)

5:23:16 |3b| "10s" = "tens" not "ten seconds"

5:24:16 beach I see. Tens of milliseconds would be unacceptable for many sound applications.

5:24:44 |3b| yeah, probably could get into the few ms range, but i'm not sure exactly without actually trying it

5:25:33 beach Like I said, I am not going to do anything about this soon, but I'll keep these things in mind when contemplating future work.

5:25:37 |3b| and also depends on the amount of work, have to be doing a lot of work per sample for a few ms of sound to be a good fiit for GPU

5:26:40 beach With additive synthesis, that ought to be the case.

7:13:07 makomo morning

7:14:07 beach Hello makomo.

9:10:53 beach So from looking around a bit, I see that a context switch for a traditional kernel-based operating system takes around a μs (order of magnitude).

9:12:55 beach I am not sure why it would be problematic to do real-time sound synthesis then.

9:13:23 beach But recent Linux definitely can't do it without special kernel modules and other settings.

9:15:02 no-defun-allowed I don't think Linux will immediately switch from the sound emitter to the mixer to the driver though.

9:16:08 no-defun-allowed If a kernel/switching-based kernel could trace message passes like that and try to schedule the involved programs sequentially, it might be more lucky though.

9:16:33 beach Ah, so it's a problem with the scheduler rather than just context switch delay.

9:17:08 no-defun-allowed I think so, because the context switch might not immidiately run the message recipient.

9:17:38 beach Right.

9:18:20 no-defun-allowed So, prioritising messages sent to mixers (which PulseAudio does, it has a nice level of -11) and/or jumping immediately into running the recipient process could lower latency.

9:18:57 no-defun-allowed I imagine the second option might decrease throughput since the sender may have more to say, though.

9:19:56 beach ACTION is digesting information.

9:20:59 beach I am not sure that in a system like CLOSOS sender and receiver would have to be in different processor nor even threads.

9:21:33 beach Well, I guess for the purpose of mixing several sources, it would have to.

9:21:48 no-defun-allowed No, I don't think that would be needed either -- except for mixing sources though, yes.

9:22:11 beach OK. let's attack the issue differently...

9:22:39 no-defun-allowed But the mixer->driver path could be eliminated, and the source->mixer path could be handled specially by the system hypothetically.

9:23:16 beach Yeah.

9:23:25 beach ... the scheduler in a traditional OS like Linux, would need to give each process hundreds of μs in order to minimize overhead due to context switches.

9:23:34 beach Right?

9:24:53 no-defun-allowed Yes.

9:25:57 beach So that kind of time slice already has a negative impact on sound synthesis. If there are several ready processes, then we are talking milliseconds, and that is dangerously near the tolerance level for sound.

9:26:21 beach But in a system like CLOSOS, a context switch is way faster.

9:26:33 no-defun-allowed That also should be correct.

9:27:18 beach So each "process" (or rather "thread" in CLOSOS) could be given shorter time slices, thereby lowering response time.

9:27:56 no-defun-allowed Yes, you could easily get more switches in if only normal registers have to be replaced.

9:28:05 beach Exactly.

9:28:23 beach Therefore, even with a dumb scheduler, if there are few enough ready threads, there would be no problem with delays for sound synthesis.

9:29:23 |3b| i think there are also issues with drivers blocking things in kernel for longer than you might want

9:29:43 no-defun-allowed I remember there being a comparison of the speeds of various parts of context switches on osdev.org, but I can't find it now.

9:29:45 beach You mean with current OSes?

9:29:50 |3b| linux specifically

9:29:54 beach I see.

9:30:05 |3b| (though possibly others too)

9:30:24 beach And do we have reasons to believe that this problem would be less of one in a system like CLOSOS?

9:30:42 beach I am not quite sure what the reason would be for those drivers to behave like that.

9:31:10 no-defun-allowed Yeah, it shouldn't be too slow to my knowledge.

9:31:11 beach no-defun-allowed: Thanks for the link. Checking...

9:31:24 no-defun-allowed It's a very large wiki page.

9:31:38 beach Wow, there have been so many signs lately that CLOSOS is the right direction to go.

9:31:54 |3b| i think a some of it is being fixed already, so wouldn't have to apply to closos, and some more is due to supporting lots of old/flaky hardware

9:32:43 beach I see, yes.

9:34:53 beach In half an hour, my lunch guests will arrive and I have to cook for them. So unfortunately, I need to suspend this discussion for several hours. :( Interesting stuff though.

10:12:47 heisig Good morning!

10:14:02 no-defun-allowed Good morning, heisig.

10:58:41 luis beach: right, Cleavir might be too heavy handed at this point. I'm not looking to have all editing commands operate on the CST/AST or anything like that. But definitely something to consider in the future. Shinmera's Staple seems promising.

10:59:28 Shinmera Feel free to rip out the staple-parser component and adapt it how you see fit.

11:01:59 luis ACK.

11:03:03 luis Shinmera: how do you currently deal with custom readtables in staple-parser?

11:03:15 Shinmera I don't.

11:04:24 Shinmera Or rather: I leave it up to the user to install the apropriate extensions to Staple do work with it correctly.

11:04:38 Shinmera I haven't been able to see a way to deal with that otherwise, really.

11:05:41 Shinmera Using an Eclector readtable that delegates to CL's current one on unknown reader macros could maybe, perhaps, work somewhat but I can see lots of ways for that to go wrong too.

11:06:46 luis Shinmera: well, my first intuition was to copy user reader macros into the Eclector readable. That wouldn't very well once the reader macro calls CL:READ, for instance

11:06:55 Shinmera Right.

11:07:42 Shinmera You could work around that by just discarding the result or wrapping it in a way that marks it as ignored for the rest of the machinery.

11:08:04 Shinmera Won't have info on custom reader things then but that's a small price to pay

11:11:35 luis But, we might be able to instrument the cl:*readable* to delegate everything to Eclector.

11:12:06 Shinmera There's an idea.

11:14:52 luis OK, I've got a rough plan, now all I need is some quality computer time. (I'm trying this on my phone right now. *sigh*) Thanks Shinmera and beach for the discussion.

11:15:05 Shinmera Any time.

11:54:27 |3b| ACTION seems to need actual regalloc/lifetime/etc stuff in my android compiler now :( too easy to run out of 16 registers without any

12:04:43 Shinmera A register allocator is the second thing we had to write in my compiler class :)

12:05:40 |3b| yeah, probably did that at some point too :)

12:06:46 |3b| ACTION currently has a register allocator that just increments a counter when it sees a new variable, which worked for some trivial things, but starts to not be enough

12:07:24 |3b| i theoretically have 64k 'registers', but some instructions can only use the first 16 or 256 of them

12:12:37 |3b| maybe i can get away with just reserving some low registers for function calls and such that are limited to 16, and copy as needed

12:54:04 |3b| yeah, that seems to have worked well enough for now :p

13:52:26 beach luis: Sure, good luck!

13:53:52 beach The lunch guests just left, but I need to go rest for a while.