10:37:41stassatsoptimizing tail (m-v-call x y), if there are no VOPs between multiple-call-* and tail-call-variable, there's no need to copy the register around, just adjust the stack arguments
10:38:14stassatsthe case of four return values, it's 5 times faster, but largely because i'm not using REP MOVS
10:38:28stassatsbut even without stacks, it's a bit faster and much more compact
10:38:59stassatscan do the same when tehre's multiple-call-*+unbind+multiple-return, since unbind doesn't clobber any registers or flags
10:39:33stassatsand special bindings is a common way of stopping tail calls, this should help
10:41:08stassatsi think i can even inline the new tail-call-variable-simple in the same space it took to call tail-call-variable
10:41:24stassatsmaybe a use case for SPACE 0, or SPACE 1
10:45:04stassatscould also use SIMD for copying, shouldn't be much of a problem if i copy one more stack place
16:55:10pkhuongrep movsb can be pretty fast nowadays.
16:55:48pkhuongit somewhat depends on alignment of the destination and of the count.