freenode/#clasp - IRC Chatlog
Search
14:22:32
Bike
clasp translate believes really deeply that variables are all in memory locations, so i ahve to rewrite a lot
14:22:55
Bike
why do we have precalc-value-instruction handle immediates? does it actualy get immediates as an argument? we shouldn't need an instruction...
14:26:06
Bike
i'm trying to move my discussion of cleavir things to #sicl, though the boundary is sometimes porous
14:50:15
drmeister
Bike: there is a very low barrier to improve things in clasp. Let’s talk about it tomorrow or Tuesday.
14:51:32
drmeister
My talk went extremely well. I had about a dozen people tell me it was the most entertaining and impressive talk.
15:26:52
drmeister
Bike: I had the backend and then tagged immediates - so they have a bolted on feel.
15:28:59
drmeister
Be wary that bclasp and cclasp share a lot of code and bclasp needs to keep working. But it can be updated as well.
15:32:55
Bike
i think maybe all we need to do is not generate precalc-value-instruction for immediates, and set up instructions to be able to take non-alloca inputs which i'm doing anyway
15:53:08
drmeister
Bike. It shouldn’t be too hard to add methods on translate-xxx-instruction and generate PTX - right? Feel free to ask questions for more context.
15:54:51
drmeister
There are some new barrier and synchronization instructions that we would need to add.
15:59:52
drmeister
Then an api call says “run this kernel on this warp of 1024 threads with this memory setup”
16:02:13
drmeister
The kernels support a tiny subset of Common Lisp. Arithmetic with unboxed values, if, tagbody/go
16:05:10
drmeister
And the HIR gives us a way to optimize the sequence of operations with barriers and to keep the chip busy.
16:07:38
Bike
well, that one we might be able to lose, it's pretty much just one operation (discrimination) that needs special codegen
16:12:45
drmeister
That’s a very interesting thought - I’ll look at the ptx calling convention with that in mind.
16:14:20
drmeister
Functions taking lots of inputs and returning one output is stupid. It should be symmetric. I bet PTX can support that.
16:15:51
drmeister
There doesn’t seem to be a much of a distinction between local memory and registers.
16:18:02
drmeister
I described this to a google engineer who helped me figure out exception handling and debug info. He got very agitated and said I needed to talk to a team of google engineers that he knew.
16:19:35
drmeister
The inlined code - but I’ve been toying with the idea of moving compiler optimizations to the gpu.