freenode/#clasp - IRC Chatlog

17:13:49 drmeister We could limit these single dispatch generic functions so that you can't remove methods - that simplifies things because then you can use a list managed using CAS to add methods.

17:15:24 drmeister I'm thinking we could leave them as a separate from standard generic functions - but use the discriminating function compiler to compile their discriminating functions to speed up dispatch.

17:17:05 drmeister The discriminating function compiler only needs a call history and a specializer profile (for single dispatch generic functions that contains a single T in the first and sometimes the second position)

17:17:53 drmeister We would have to add the ability to handle SingleDispatchMethod objects as the outcome - but that's a single function in every case.

17:18:12 drmeister The method combination is the trivial case of a single function.

17:19:55 drmeister Yeah - of this: https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/hierarchy.lsp#L131

17:20:59 drmeister What is a-p-o-function ?

17:22:14 drmeister It's great that you made rack slots atomic - that makes things a lot easier.

17:32:30 Bike a-p-o-function is a function that permutes the parameters in order to satisfy the argument precedence order

17:32:48 Bike since single dispatch functions only specialize on their first parameter (right?) it's probably irrelevant

17:33:35 drmeister There are a few single dispatch functions that specialize on their second parameter. I'm thinking of replacing them with functions that do their own dispatch.

17:33:50 drmeister They feel like grit in the gears.

17:36:23 Bike the apo function is put in... uh... where the hell is it...

17:36:40 Bike maybe i should expand the readme more

17:37:06 Bike https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/kernel.lsp#L347-L360

17:37:12 drmeister Did you write anything to update slots using CAS?

17:37:34 drmeister Or should I do the CAS loop thing myself?

17:37:54 drmeister Oh - hang on - checking...

17:38:05 Bike i might have written a function to augment the call history specifically

17:38:15 luni Which is the meaning of CAS?

17:38:57 Bike https://github.com/clasp-developers/clasp/blob/master/src/lisp/kernel/clos/closfastgf.lsp#L499-L520 here's the code that adds a call history entry for a new call

17:39:06 Bike luni: https://en.wikipedia.org/wiki/Compare-and-swap

17:39:25 luni Thank you

17:40:41 drmeister Hmm, a special method for generic function call histories won't work here because I'm not using that rack layout. I'm a bit more low level. But low_level_rackSet uses _Slots.store(i, value);

17:41:54 drmeister And std::atomic::store - that's atomic. I only ask because the consequences of assuming the wrong thing are catastrophic.

17:42:16 drmeister But yeah - std::atomic::store - that's atomic.

17:42:31 drmeister ACTION is getting his multithreaded programming legs back under him.

17:46:45 drmeister Oh - wait - I still need to use the CAS loop. Because an instanceRef to get the call-history and then an instanceSet to set it again gives time for some other thread to get in there and update the call-history. That would be bad.

17:47:20 drmeister It's not just the atomic write - it's the atomic update of the absolutely guaranteed to be up to date call-history that is important.

17:47:25 drmeister Whew - dodged a bullet there.

17:47:48 drmeister luni: Multithreaded programming is dangerous and exciting.

17:48:51 drmeister The linked list becomes a very useful data structure with CAS.

17:50:42 drmeister You efficiently add to the list in multithreaded code and read the list in a lock free way.

17:52:28 luni Ok yes but this kind of applications are a little bit advanced for me at the moment

17:53:14 luni i mean this usage

17:57:53 drmeister I won't argue with you - but if you keep going with this you will find that there are certain things that are really simple about multithreaded programming. If you want to record something and you don't need to remove anything from that record - a simple linked list is ideal. It can be added to and read in multithreaded code using CAS by following a very simple update recipe where you use a loop containing a

17:57:53 drmeister compare_and_swap operation.

17:59:23 drmeister Bike: That being said - did you add a compare_and_swap operation for rack slots? I don't see one - say like low_level_rackCompareAndSwap

18:00:18 Bike in C++? I don't recall

18:00:34 Bike i think mp:cas uses the compiler

18:00:50 drmeister Ah - right - we have a compiler.

18:01:37 Bike the lisp way is uh, (mp:cas (funcallable-standard-instance-access ...) ...) i think

18:01:52 Bike i didn't hook it up to use accessor methods yet

18:03:49 drmeister I'm in C++ here - bootstrapping my way into that.

18:04:27 drmeister I'll add a low_level_rack_compare_exchange_weak and low_level_rack_compare_exchange_strong as sugar - they will be inlined.

18:04:45 drmeister Just so I don't have to crawl into the rack and access _Slots directly.

18:04:54 drmeister Does that sound ok?

18:05:17 drmeister Like - right after these... https://github.com/clasp-developers/clasp/blob/dispatch/include/clasp/core/instance.h#L64

18:09:25 Bike sure

18:10:07 Bike you can probably use the weak version if you're in a loop, btw. might save you two nanoseconds

18:12:14 drmeister Bike: Yes - I kind of understand the weak vs strong forms of the function - but do you know how to use the sync, success and failure arguments?

18:12:53 drmeister My understanding of the weak vs strong forms is if the stuff I do in the loop is expensive and I don't want to pay the price of undoing it - I use the strong version.

18:13:39 drmeister If it's cheap - like it is here, I'm just updating the rplacd of the new entry - so yes - I'll use the weak form.

18:14:18 Bike yeah it's uh involved

18:15:18 Bike er, i dunno what sync is though. i don't see that in std::compare_exchange_whatever, though

18:16:03 Bike success and failiure are the memory orders for the success and failure cases. you should ignore those, the default is fine

18:16:42 Bike the weak form just means spurious failure is allowed, so the cas will report failure even if the swap actually occurred

18:16:47 Bike sometimes

18:17:11 Bike using the strong form essentially means an implicit loop of the weak form

18:18:55 drmeister Ah, here instead of sync it's called 'order' - it looks like a short form of success,failure: https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange

18:19:53 drmeister Gah - I guess you can't expect something this low level to be easy.

18:27:04 drmeister I think this will do the job of updating the call-history...

18:27:06 drmeister https://www.irccloud.com/pastebin/hogvgtpU/

18:29:35 drmeister I can probably get rid of 'current' and replace wherever I use it with with 'expected'.

18:30:09 drmeister So...

18:30:11 drmeister https://www.irccloud.com/pastebin/3Me9CYAX/

18:30:40 drmeister Oh - wait - that won't compile.

18:31:20 drmeister https://www.irccloud.com/pastebin/8RJwX2ng/

18:31:59 Bike oh, yeah, "order" is just to specify success and failure together

18:32:03 drmeister Sorry to torture you with all this - I'm finding talking about this helpful to get back in the game.

18:34:12 drmeister Building now...

18:37:30 drmeister Hmm, you didn't add an operator[] to GCArray_atomic.

18:51:45 Bike it can't return a reference

18:51:52 Bike if i remember correctly

18:55:06 drmeister Ah - so I have it return a reference to std::atomic<T>, it feels a bit goofy - but the kind of goofy that we run into with C++ all the time.

18:55:32 drmeister We aren't going to use this a lot - we got this far without it.

18:56:34 drmeister Otherwise we are going to be using instance->_Rack->_Data[index].compare_exchange_weak(...). gah.

18:56:51 drmeister 'private' - we don't use no stinkin' 'private'.

18:58:48 drmeister Aside: I got tired of Hyrule Warriors - Age of Calamity in about 5 minutes - it's a boring button masher.

19:09:28 Bike references to atomics might work, i don't know. it seemed pretty weird to me

19:10:24 drmeister It compiles - I'm fixing other problems before I can say it works.

19:11:01 drmeister In memory - it's just a word - returning a reference to an atomic should just be the compiler treating it as a std::atomic<T>

19:13:19 Bike it makes sense that it would work, i just got kind of frustrated at the C++ atomics interface

19:14:24 karlosz .

19:14:24 Colleen karlosz: Bike said 4 hours, 13 minutes ago: local call idea: 1) lose local-call class 2) add slot to abstract-call that's either NIL or the FUNCTION being called 3) inline etc use that slot but leave enclose in place 4) a later, possibly client specific pass eliminates the enclose instruction if the calls don't need it

19:14:38 drmeister In my second mission in Hyrule Warriors I destroyed an Ancient Guardian - then I had an ironsmith level up a soup ladle to level 6 and with it - I am leveling armies. I won't be going much farther with this.

19:15:43 drmeister At least it gives me something to talk about with the kids.

19:17:15 drmeister aclasp is building now using call-history for dispatching.

19:19:23 drmeister So now it searches the call-history alist for the receiver-class and calls the associated method. If it doesn't find one it searches the class-precedence-list of the receiver-class looking for methods that handle one of those classes and updates the call history and then dispatches to the method.

19:21:53 drmeister Hmmm, this does look kind of slow. aclasp building is getting bogged down on some of the files.

19:25:18 drmeister Really bogged down. Wow - searching vectors is waaay faster than linked lists.

19:28:58 karlosz Bike: what do we gain from doing that? the idea with deleting encloses as part of the analysis was so that we have an easy way of querying whether a function escapes or is only used in call position

19:29:10 karlosz fulfilling the "escape" analysis

19:29:59 karlosz also i don't really understand why get rid of the class, since it seems like a good use of using a different class

19:30:20 karlosz doing a :kind slot seems a bit backwards?

19:30:22 Bike i want to be able to do that for functions that have invovled lambda lists and/or are mv-called

19:30:50 karlosz do what?

19:31:08 Bike check whether it's only used in call position. possibly inline/etc later

19:31:38 Bike the question of whether it's only called seems orthogonal to the question of whether a closure needs to be allocated

19:32:26 karlosz is it? if it's explicitly referenced it needs an enclose right?

19:32:41 karlosz i don't mean to say enclose necessarily means closure allocation - closurettes prove that

19:33:12 karlosz i was thinking enclose more as like "create explicit function object"

19:33:35 Bike i mean right now if we have a function that's only called, we'll transform calls into local calls if only if its lambda list doesn't have &rest or &key, right?

19:34:28 karlosz right, what was wrong with the idea of doing that for any kind of lambda list?

19:34:51 karlosz and divorcing that concept from inlining?

19:35:35 karlosz the interpolator can't handle hairier lambda lists, and clasp can't make direct calls out of local calls with hairier lambda lists, but that's fine

19:36:00 Bike well yes, i mean i want to do it for any kind of lambda list, but clasp and possibly other clients might need a closure for that

19:36:04 karlosz we can have the local call translator just fall back to a normal enclose + call style translation

19:36:27 karlosz like, the local call translator just allocates the closure and calls it in the fallback case

19:36:45 Bike so a different enclose for each local call?

19:36:53 karlosz there's no issue with the local call translator allocating the closure because its not being used first class so you don't get EQ problems

19:37:04 karlosz right, i mean it is the "fallback" case

19:37:27 karlosz i can't really think of a situation where it'd be an issue

19:38:04 karlosz translate-instruction for local call in the fallback case just does whatever enclose would have done and then calls it

19:38:04 Bike hmkay, i can do it that way then

19:38:48 karlosz i mean, local call analysis is already optional, but this gives clients a way to choose the granularity of what it can handle directly

19:39:06 karlosz while keeping the BIR side conceptually uniform

19:39:18 karlosz i.e. "a call to a function in the same module"

20:04:28 karlosz Bike: the other reason for this suggestion is that we have way more flexibility if we don't use the enclose instruction in BIR

20:04:52 karlosz like, with the strategy you proposed of reusing the enclose instruction, i don't think it would be possible to separate out the action of closure parsing and arg parsing

20:05:06 karlosz but no matter how hairy the lambda list is, we can still skip closure parsing with local calls

20:05:28 karlosz so we can choose to make the "fallback" case better by splitting those up, right?

20:05:46 karlosz either way, the client gets to make the decision this way

20:07:34 drmeister I was wrong about using the call-history being too slow to be useful. I had a bug in my call-history update - I didn't use the argument class and so the call-histories were screwed up and it kept falling back to the slow path. It's as fast as our old dispatching method.

20:10:40 drmeister So that's good - we can use this and not have to come up with some Frankenstein data structures to get aclasp and bclasp built in a reasonable amount of time. After bclasp is up we can compile discriminating functions - that should just require a bit of tweaking.

20:16:38 Bike karlosz: splitting what up?

20:18:18 karlosz i mean, the enclose instruction encloses the entry point which loads out of the closure vector and parses arguments right

20:18:32 karlosz for &rest and &key we don't need to allocate a closure

20:18:44 karlosz we can just direct call with the environment augmented as arugments in the beginning

20:19:06 karlosz so it doesn't make sense to use the full enclose anyway for those

20:19:31 Bike oh. yeah. but then another entry point is needed and stuff.

20:19:43 Bike i can just have it allocate a closure first and then worry about more entry points as an improvement

20:20:23 karlosz yeah, right, that's what i meant by client flexibility this way

20:27:49 karlosz by the way, i added IF-IF elimination and turned on the meta evaluation pass

20:28:16 karlosz apparently IF-IF elimination applied about 45287 during a self build

20:28:37 karlosz it seems to have culled out enough code to speed up some stuff measurably

20:29:19 karlosz asdf compiles faster and conses less, i'm guessing because of tighter code generation

20:29:32 karlosz ironclad compilation is down to 2 minutes and 10 seconds

20:30:31 karlosz right now im struggling with pass management stuff, since meta evaluation is the first "real" lisp level optimization that can apply over and over again which we do in bir

20:30:57 karlosz right now i just run it 3 times over the graph , but that doesn't catch everything and is sometimes unnecessary

20:31:17 karlosz since eliminating an IF IF and then folding an unreachable branch can cause more optimizations

20:32:16 karlosz since we "binary" merge currently (things like (if ... ... (if ...))) generates a "redundant" merge block) something like (if (cond .... ... ..) ...) takes as many passes as there are cond branches to IF-IF eliminate everything

20:32:18 karlosz unfortunately

20:32:42 karlosz so rather than hardwiring the number of passes to grovel over the graph we need to find a smarter way to fixpoint to optimality

20:33:31 karlosz Python does it with reoptimize slots on nodes blocks and components but introducing slots on BIR objects for a bir transformation optimization pass is of course meh

20:34:21 karlosz i'm currently thinking about a work queue which carefully keeps things in forward flow so we don't need to examine more than we need to, but that requires some coordination with everything

20:34:59 karlosz also we probably want to push types in conjuction with the other abstract domain types in the metaeval "phase" too, so gotta keep that in mind

20:36:21 drmeister Hooowheee!

20:36:35 drmeister Good going karlosz

20:37:54 drmeister I mean good going all around - this is really an impressive team effort bringing BIR online and now using it to improve things.

20:46:20 drmeister I pushed the single dispatch generic function changes to the 'master' branch. It will remove those annoying debug messages.

20:56:49 drmeister ::notify selwyn Sorry to bring this up but the new-rebuild-dist doesn't work like rebuild-dist for quickclasp. new-rebuild-dist fails and dumps me in the sbcl command line.

20:56:49 Colleen drmeister: Got it. I'll let selwyn know as soon as possible.

21:09:32 drmeister karlosz: Are your IF-IF elimination changes pushed yet?

21:10:04 drmeister Because I think I just improved quicklisp compilation by about 20% and I wanted to know if that was me or you.

21:11:23 karlosz drmeister: it's sitting as this PR... https://github.com/clasp-developers/clasp/pull/1094

21:12:23 drmeister Oh - can I go ahead and merge that?

21:12:37 karlosz sure thin

21:12:46 karlosz it builds for me fine

21:13:16 karlosz just fyi, the buildbot has been failing since your accidental push

21:13:41 karlosz so i don't know if it's more convenient or not to resolve that before merging

21:13:44 drmeister Oh - thanks - let's see what the most recent update does.

22:00:21 drmeister Still broken - pulling everything into hermes to try there.

22:05:40 drmeister deploy runs on the mac - that's what the buildbot runs.

23:15:05 drmeister karlosz: How do I see if your IF-IF change is in place?

23:22:54 drmeister I build ironclad with all the latest changes and I get...

23:23:04 drmeister Time real(148.430 secs) run(148.432 secs) consed(9568808064 bytes) interps(68) unwinds(1767)

23:23:24 drmeister On a clasp from before my single dispatch changes and your if-if changes...

23:23:33 drmeister Time real(157.104 secs) run(157.105 secs) consed(9902079664 bytes) interps(5) unwinds(1767)

0:16:13 karlosz drmeister: well, it should show in the disassembly, but an easy way to see if it's firing is to uncomment this https://github.com/s-expressionists/Cleavir/blob/ae7cd58d00f6ed586c5c0f752091d6809a864f03/BIR-transformations/meta-evaluate.lisp#L112

0:16:43 karlosz but if everything is merged it should be

0:17:30 karlosz i don't know if your new single dispatch changes is supposed to cause consing to go down significantly, but the if-if changes did cause reduced consing because less llvm instructions got consed up

0:17:52 karlosz you before and after seems to show around 250 mb less consing

3:25:37 drmeister That is probably your stuff. The single dispatch changes shouldn't change how much consing is going on - not 250 mb

3:26:27 drmeister ::notify Bike We are getting a weird crash on the buildbot with the deploy script. Not anything else, not macOS, not my linux machine - just the buildbot. The error and some of the backtrace is here: https://gist.github.com/drmeister/8232a97e40b55f477bcbdef56fba3fa1

3:26:28 Colleen drmeister: Got it. I'll let Bike know as soon as possible.

3:27:03 drmeister EXPRESSION kinda looks like a symbol to me.

3:28:29 drmeister It's very reproducible - it's happened five times in a row with different commits. Any ideas?

4:08:18 beach Good morning everyone!

4:58:14 karlosz wow, some incredibly simple changes now gets ironclad compiling like: Time real(125.112 secs) run(125.114 secs) consed(8553760464 bytes) interps(2) unwinds(1540)

5:00:01 drmeister What was the change?

5:00:53 karlosz ::notify Bike you're not going to believe this, but a 3 line change causes an entire gigabyte less of consing while compiling ironclad and also shaves 20 seconds off the whole kaboodle from 2 mins 20 seconds to 2 mins. the change looks like this... https://paste.gnome.org/pryehpgkv

5:00:53 Colleen karlosz: Got it. I'll let Bike know as soon as possible.

5:01:55 karlosz since (cl:car ...) and (cl:null ...) are so frequently inlined, making the expansions easier for the compiler to digest (not having to bring in the definition of EQ in, for example), makes the compiler's job a lot easier