freenode/#clasp - IRC Chatlog
Search
17:13:49
drmeister
We could limit these single dispatch generic functions so that you can't remove methods - that simplifies things because then you can use a list managed using CAS to add methods.
17:15:24
drmeister
I'm thinking we could leave them as a separate from standard generic functions - but use the discriminating function compiler to compile their discriminating functions to speed up dispatch.
17:17:05
drmeister
The discriminating function compiler only needs a call history and a specializer profile (for single dispatch generic functions that contains a single T in the first and sometimes the second position)
17:17:53
drmeister
We would have to add the ability to handle SingleDispatchMethod objects as the outcome - but that's a single function in every case.
17:19:55
drmeister
Yeah - of this: https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/hierarchy.lsp#L131
17:32:30
Bike
a-p-o-function is a function that permutes the parameters in order to satisfy the argument precedence order
17:32:48
Bike
since single dispatch functions only specialize on their first parameter (right?) it's probably irrelevant
17:33:35
drmeister
There are a few single dispatch functions that specialize on their second parameter. I'm thinking of replacing them with functions that do their own dispatch.
17:37:06
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/kernel.lsp#L347-L360
17:38:57
Bike
https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/closfastgf.lsp#L499-L520 here's the code that adds a call history entry for a new call
17:40:41
drmeister
Hmm, a special method for generic function call histories won't work here because I'm not using that rack layout. I'm a bit more low level. But low_level_rackSet uses _Slots.store(i, value);
17:41:54
drmeister
And std::atomic::store - that's atomic. I only ask because the consequences of assuming the wrong thing are catastrophic.
17:46:45
drmeister
Oh - wait - I still need to use the CAS loop. Because an instanceRef to get the call-history and then an instanceSet to set it again gives time for some other thread to get in there and update the call-history. That would be bad.
17:47:20
drmeister
It's not just the atomic write - it's the atomic update of the absolutely guaranteed to be up to date call-history that is important.
17:50:42
drmeister
You efficiently add to the list in multithreaded code and read the list in a lock free way.
17:57:53
drmeister
I won't argue with you - but if you keep going with this you will find that there are certain things that are really simple about multithreaded programming. If you want to record something and you don't need to remove anything from that record - a simple linked list is ideal. It can be added to and read in multithreaded code using CAS by following a very simple update recipe where you use a loop containing a
17:59:23
drmeister
Bike: That being said - did you add a compare_and_swap operation for rack slots? I don't see one - say like low_level_rackCompareAndSwap
18:04:27
drmeister
I'll add a low_level_rack_compare_exchange_weak and low_level_rack_compare_exchange_strong as sugar - they will be inlined.
18:05:17
drmeister
Like - right after these... https://github.com/clasp-developers/clasp/blob/dispatch/include/clasp/core/instance.h#L64
18:10:07
Bike
you can probably use the weak version if you're in a loop, btw. might save you two nanoseconds
18:12:14
drmeister
Bike: Yes - I kind of understand the weak vs strong forms of the function - but do you know how to use the sync, success and failure arguments?
18:12:53
drmeister
My understanding of the weak vs strong forms is if the stuff I do in the loop is expensive and I don't want to pay the price of undoing it - I use the strong version.
18:13:39
drmeister
If it's cheap - like it is here, I'm just updating the rplacd of the new entry - so yes - I'll use the weak form.
18:15:18
Bike
er, i dunno what sync is though. i don't see that in std::compare_exchange_whatever, though
18:16:03
Bike
success and failiure are the memory orders for the success and failure cases. you should ignore those, the default is fine
18:16:42
Bike
the weak form just means spurious failure is allowed, so the cas will report failure even if the swap actually occurred
18:18:55
drmeister
Ah, here instead of sync it's called 'order' - it looks like a short form of success,failure: https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange
18:29:35
drmeister
I can probably get rid of 'current' and replace wherever I use it with with 'expected'.
18:32:03
drmeister
Sorry to torture you with all this - I'm finding talking about this helpful to get back in the game.
18:55:06
drmeister
Ah - so I have it return a reference to std::atomic<T>, it feels a bit goofy - but the kind of goofy that we run into with C++ all the time.
18:56:34
drmeister
Otherwise we are going to be using instance->_Rack->_Data[index].compare_exchange_weak(...). gah.
18:58:48
drmeister
Aside: I got tired of Hyrule Warriors - Age of Calamity in about 5 minutes - it's a boring button masher.
19:11:01
drmeister
In memory - it's just a word - returning a reference to an atomic should just be the compiler treating it as a std::atomic<T>
19:13:19
Bike
it makes sense that it would work, i just got kind of frustrated at the C++ atomics interface
19:14:24
Colleen
karlosz: Bike said 4 hours, 13 minutes ago: local call idea: 1) lose local-call class 2) add slot to abstract-call that's either NIL or the FUNCTION being called 3) inline etc use that slot but leave enclose in place 4) a later, possibly client specific pass eliminates the enclose instruction if the calls don't need it
19:14:38
drmeister
In my second mission in Hyrule Warriors I destroyed an Ancient Guardian - then I had an ironsmith level up a soup ladle to level 6 and with it - I am leveling armies. I won't be going much farther with this.
19:19:23
drmeister
So now it searches the call-history alist for the receiver-class and calls the associated method. If it doesn't find one it searches the class-precedence-list of the receiver-class looking for methods that handle one of those classes and updates the call history and then dispatches to the method.
19:21:53
drmeister
Hmmm, this does look kind of slow. aclasp building is getting bogged down on some of the files.
19:28:58
karlosz
Bike: what do we gain from doing that? the idea with deleting encloses as part of the analysis was so that we have an easy way of querying whether a function escapes or is only used in call position
19:29:59
karlosz
also i don't really understand why get rid of the class, since it seems like a good use of using a different class
19:30:22
Bike
i want to be able to do that for functions that have invovled lambda lists and/or are mv-called
19:31:38
Bike
the question of whether it's only called seems orthogonal to the question of whether a closure needs to be allocated
19:32:41
karlosz
i don't mean to say enclose necessarily means closure allocation - closurettes prove that
19:33:35
Bike
i mean right now if we have a function that's only called, we'll transform calls into local calls if only if its lambda list doesn't have &rest or &key, right?
19:35:35
karlosz
the interpolator can't handle hairier lambda lists, and clasp can't make direct calls out of local calls with hairier lambda lists, but that's fine
19:36:00
Bike
well yes, i mean i want to do it for any kind of lambda list, but clasp and possibly other clients might need a closure for that
19:36:04
karlosz
we can have the local call translator just fall back to a normal enclose + call style translation
19:36:27
karlosz
like, the local call translator just allocates the closure and calls it in the fallback case
19:36:53
karlosz
there's no issue with the local call translator allocating the closure because its not being used first class so you don't get EQ problems
19:38:04
karlosz
translate-instruction for local call in the fallback case just does whatever enclose would have done and then calls it
19:38:48
karlosz
i mean, local call analysis is already optional, but this gives clients a way to choose the granularity of what it can handle directly
20:04:28
karlosz
Bike: the other reason for this suggestion is that we have way more flexibility if we don't use the enclose instruction in BIR
20:04:52
karlosz
like, with the strategy you proposed of reusing the enclose instruction, i don't think it would be possible to separate out the action of closure parsing and arg parsing
20:05:06
karlosz
but no matter how hairy the lambda list is, we can still skip closure parsing with local calls
20:07:34
drmeister
I was wrong about using the call-history being too slow to be useful. I had a bug in my call-history update - I didn't use the argument class and so the call-histories were screwed up and it kept falling back to the slow path. It's as fast as our old dispatching method.
20:10:40
drmeister
So that's good - we can use this and not have to come up with some Frankenstein data structures to get aclasp and bclasp built in a reasonable amount of time. After bclasp is up we can compile discriminating functions - that should just require a bit of tweaking.
20:18:18
karlosz
i mean, the enclose instruction encloses the entry point which loads out of the closure vector and parses arguments right
20:18:44
karlosz
we can just direct call with the environment augmented as arugments in the beginning
20:19:43
Bike
i can just have it allocate a closure first and then worry about more entry points as an improvement
20:29:19
karlosz
asdf compiles faster and conses less, i'm guessing because of tighter code generation
20:30:31
karlosz
right now im struggling with pass management stuff, since meta evaluation is the first "real" lisp level optimization that can apply over and over again which we do in bir
20:30:57
karlosz
right now i just run it 3 times over the graph , but that doesn't catch everything and is sometimes unnecessary
20:31:17
karlosz
since eliminating an IF IF and then folding an unreachable branch can cause more optimizations
20:32:16
karlosz
since we "binary" merge currently (things like (if ... ... (if ...))) generates a "redundant" merge block) something like (if (cond .... ... ..) ...) takes as many passes as there are cond branches to IF-IF eliminate everything
20:32:42
karlosz
so rather than hardwiring the number of passes to grovel over the graph we need to find a smarter way to fixpoint to optimality
20:33:31
karlosz
Python does it with reoptimize slots on nodes blocks and components but introducing slots on BIR objects for a bir transformation optimization pass is of course meh
20:34:21
karlosz
i'm currently thinking about a work queue which carefully keeps things in forward flow so we don't need to examine more than we need to, but that requires some coordination with everything
20:34:59
karlosz
also we probably want to push types in conjuction with the other abstract domain types in the metaeval "phase" too, so gotta keep that in mind
20:37:54
drmeister
I mean good going all around - this is really an impressive team effort bringing BIR online and now using it to improve things.
20:46:20
drmeister
I pushed the single dispatch generic function changes to the 'master' branch. It will remove those annoying debug messages.
20:56:49
drmeister
::notify selwyn Sorry to bring this up but the new-rebuild-dist doesn't work like rebuild-dist for quickclasp. new-rebuild-dist fails and dumps me in the sbcl command line.
21:10:04
drmeister
Because I think I just improved quicklisp compilation by about 20% and I wanted to know if that was me or you.
21:11:23
karlosz
drmeister: it's sitting as this PR... https://github.com/clasp-developers/clasp/pull/1094
23:23:04
drmeister
Time real(148.430 secs) run(148.432 secs) consed(9568808064 bytes) interps(68) unwinds(1767)
23:23:33
drmeister
Time real(157.104 secs) run(157.105 secs) consed(9902079664 bytes) interps(5) unwinds(1767)
0:16:13
karlosz
drmeister: well, it should show in the disassembly, but an easy way to see if it's firing is to uncomment this https://github.com/s-expressionists/Cleavir/blob/ae7cd58d00f6ed586c5c0f752091d6809a864f03/BIR-transformations/meta-evaluate.lisp#L112
0:17:30
karlosz
i don't know if your new single dispatch changes is supposed to cause consing to go down significantly, but the if-if changes did cause reduced consing because less llvm instructions got consed up
3:25:37
drmeister
That is probably your stuff. The single dispatch changes shouldn't change how much consing is going on - not 250 mb
3:26:27
drmeister
::notify Bike We are getting a weird crash on the buildbot with the deploy script. Not anything else, not macOS, not my linux machine - just the buildbot. The error and some of the backtrace is here: https://gist.github.com/drmeister/8232a97e40b55f477bcbdef56fba3fa1
3:28:29
drmeister
It's very reproducible - it's happened five times in a row with different commits. Any ideas?
4:58:14
karlosz
wow, some incredibly simple changes now gets ironclad compiling like: Time real(125.112 secs) run(125.114 secs) consed(8553760464 bytes) interps(2) unwinds(1540)
5:00:53
karlosz
::notify Bike you're not going to believe this, but a 3 line change causes an entire gigabyte less of consing while compiling ironclad and also shaves 20 seconds off the whole kaboodle from 2 mins 20 seconds to 2 mins. the change looks like this... https://paste.gnome.org/pryehpgkv
5:01:55
karlosz
since (cl:car ...) and (cl:null ...) are so frequently inlined, making the expansions easier for the compiler to digest (not having to bring in the definition of EQ in, for example), makes the compiler's job a lot easier