freenode/#sicl - IRC Chatlog

8:38:35 beach I am watching the presentation by Matt Kulukundis. I think a lot of the complications he talks about, like large keys and large objects, are problems that don't exist in a language with uniform reference semantics. Also, I think the tricks they invented are quite clever. Some of them may require some assembly code to be taken advantage of in a Common Lisp system.

8:48:18 no-defun-allowed Yes, you can cheat the large object problem with references, when you have objects (like symbols and fixnums) which can be compared with EQ. But I kept fairly faithful to the design of the table (except that I keep hashes in the table, so that resizing doesn't have to recompute hashes), and the only unportable part I recall is probing using SIMD.

8:49:18 beach I think that's true.

8:56:02 no-defun-allowed But I would imagine a Common Lisp implementation could include a safe SIMD library (and I'd like to say "portable" too, I think LLVM gets away with portable vector code somehow). It wouldn't be hard, as the only operations that touch memory are loads and stores, which would just require bounds checking.

8:57:57 no-defun-allowed ...when the elements are integers or floats. I don't think anyone has seriously considered putting references to objects in vector registers.

8:58:50 jeosol The talk was interesting indeed.

10:21:47 jackdaniel fun (subjective) observation: among all cl-related irc channels I'm in this is the most active

10:22:11 beach Wow!

10:23:33 no-defun-allowed 🎉

10:23:58 beach no-defun-allowed: I didn't follow all the details in the talk. I'll probably watch it again. But, here is a question: would your implementation be particularly fast for the use case that is so common in Cleavir, namely 1. test (not in the table) 2. insert it?

10:24:32 beach If so, we should suggest it to the Clasp people who seem to care about compilation time.

10:25:04 no-defun-allowed I have not tested the performance of inserting yet, but GETHASH for keys which aren't in the table appears to be about as fast as for keys in the table.

10:25:48 beach I was hoping for faster. Not so, huh?

10:26:59 no-defun-allowed Revising my notes, misses are slightly faster than hits.

10:27:31 beach Good.

10:27:56 no-defun-allowed On average, a miss is about 3% faster than a miss, but that doesn't seem significant.

10:28:12 beach Not really, no.

10:28:41 beach I think this might be a common use case. For Cleavir, we traverse an instruction graph with relatively few back arcs. So most of the time, it is not going to be in the table.

10:28:48 no-defun-allowed The list and bucket hash tables have slower misses than hits, but those are also insignificant results.

10:30:07 beach Do you do some caching so that a gethash followed by a (setf gethash) with the same object does not do the search all over again/

10:30:22 beach I believe SBCL does something like that, no?

10:30:28 no-defun-allowed I do not, currently. But SBCL does do that, yes.

10:32:20 beach So here is an idea. Keep a bitmap indexed by hash values modulo its size, but say 10 times as many entries as in the table. It would act as a Bloom filter, so that if there is a 0, then the object is definitely not in the table.

10:33:06 beach Then, instead of inserting immediately, accumulate objects sequentially until there is a hit in the table.

10:33:54 beach Then, at the very least, it is possible to make a good guess of the size the table should be to accommodate the new objects.

10:34:13 beach It would then avoid resizing and rehashing too often.

10:34:18 no-defun-allowed Right.

10:34:25 beach Provided the bitmap is big enough of course.

10:35:37 beach But a finer value than 10 can likely be derived.

10:35:44 beach Let me give that some thought actually.

10:36:22 beach But you get the idea.

10:36:40 beach It might be a special table for this use case.

10:37:42 no-defun-allowed I have found the list-hash-table to be faster than both the bucket table and linear probing table, with fewer than 7 mappings. A miss would provoke worst-case behaviour, but it's still somehow faster.

10:43:23 beach That's interesting.

10:43:26 no-defun-allowed Now I'm thinking about how similar a Bloom filter and the metadata table are in purpose. (They are evidently different in construction, but they both are used to handle misses quickly.)

10:43:43 beach OK, so here is a somewhat more thought-out idea...

10:44:55 no-defun-allowed Though, isn't there an often repeated observation that a list is a faster set than a hash table for, say, fewer than 20 elements? There is a constant overhead due to using generic function on SBCL, so the crossover point is lower.

10:45:46 beach Yes, the exact value varies, but the idea is consistently true it seems.

10:46:28 beach When the hash table is created, allocate a bitmap of some reasonable size that is significantly smaller than the overhead of the hash table itself, say 8 full words, so 512 bits.

10:47:23 beach For gethash, first take the hash value modulo the size of the bitmap. If the bit is cleared, return NIL.

10:48:58 beach For (setf gethash), either remember the latest hash value or do the test again. If it is a new object, accumulated it in a vector, I guess adjustable.

10:49:51 beach When the bit test returns 1, check how many objects have accumulated, and resize the table accordingly. Then insert all the existing elements plus the accumulated ones.

10:50:32 beach Also, adjust the size of the bitmap to reflect the current number of elements, and set the bits for the existing elements.

10:50:46 beach Maybe the adjustable vector is going to be a problem.

10:51:04 beach Maybe just a fixed-size vector, also depending on the size of the table.

10:52:10 beach Oh, now I remember what's killing the performance of Clasp...

10:52:37 beach It's the Boehm collector that doesn't handle the frequent reallocation of a bigger table.

10:53:04 beach So just doing fewer reallocations with better estimated sizes might be a great win.

10:56:25 no-defun-allowed It sounds like the metadata table could be used for this, but I can't work out how yet.

10:57:53 beach Yes, that's what inspired me. So let's put this idea on the back burner but keep it in mind.

11:00:53 no-defun-allowed But I'm also not sure how the bitmap allows for estimating hash table sizes. We count the number of keys used between some given key being used, and then reused (ignoring collisions in the bitmap)?

11:03:11 no-defun-allowed I will re-read your idea tomorrow.

11:20:10 beach No, the total number of existing elements, plus the new accumulated ones will give the estimate.

11:21:03 no-defun-allowed Ah right. Don't we resize the adjustable vector with accumulated objects using a similar scheme to the hash table?

11:21:12 beach I should put a meter in the Cleavir code so that we can see how many nodes are typically processed.

11:21:48 beach Yes, that's what I meant by the adjustable vector being a problem. But at least no probing is required.

11:22:14 beach Time for a lunch break.

11:22:21 no-defun-allowed See ya.

11:35:18 no-defun-allowed Well, when you insert the elements at the end of measurement, you still probe as well.

12:34:02 heisig Hello everyone!

12:36:41 heisig Speaking of hash tables - I always wondered why hash tables with less than about 15 elements aren't implemented as an alist.

12:37:24 heisig I occasionally write my own hash table wrapper that does exactly that, and it is measurably faster in many cases.

12:38:18 heisig In many practical cases, the number of elements in a table is much less than 15.

12:39:29 heisig Have I missed a reason why implementations don't do this?

13:11:16 beach Hello heisig.

13:37:12 pjb heisig: note the actual threshold count is implementation and platform dependent (and of course, also dependent on the test and hash functions (not sxhash!)). You would need to benchmark it at the start of the program.

13:37:58 pjb heisig: I've measured as low as 5 elements with sbcl and as high as 35 elements with clisp, on the platform where I tested it.

13:40:37 beach jackdaniel: Where did you find the statistics about the IRC channels?

14:02:59 jackdaniel beach: no statistics, as I said "subjective observation"

14:05:08 Bike for the record the hash table growth thing isn't so much a problem in clasp any more. BIR has some higher structure that makes it traversable without consing.

14:07:14 Bike and ASTs have an actual tree structure now, so you don't need to maintain a seen table.

14:07:45 Bike (the only cases ASTs weren't trees had to do with lexical variables and blocks - nothing too fundamental)

14:08:07 Bike that said, faster hash tables are always nice to have

14:34:18 beach jackdaniel: Oh, sorry.

14:35:02 beach Bike: OK. Thanks.

14:36:34 luni Good morning #sicl

14:37:47 beach Hello luni.

14:38:04 luni Good day beach

14:44:05 beach Bike: Do you still have the meter present for MAP-INSTRUCTIONS-ARBITRARY-ORDER?

14:44:27 beach If so, what information do you get if you call REPORT on that meter?

14:45:31 beach ... after you have done some significant number of compilations, of course.

14:46:52 beach In SICL bootstrapping, it is responsible for 10% of the total execution time.

14:47:23 beach And I don't inline any global functions yet, so my HIR graphs are modest.

14:48:13 beach The average size for an invocation is just 73 nodes. But then, I also have many small source files.

14:48:19 Bike there's not an exactly analogous operation to m-i-a-o. but the way traversals work now is just following "next" slots, so overhead is pretty low.

14:48:42 Bike 10% of execution time even with no inlining sounds like it could end up being problematic.

14:48:49 beach Yes.

14:49:54 Bike in more detail, how it works in BIR is that reverse postorder is computed and cached in the nodes, so iteration just needs to go through it. so traversal doesn't cons, and it also allows multiple traversals to go simultaneously (unlike if there was a mark slot in the node)

14:51:04 beach Sounds good. But that order has to be computed once, right? And then maintained for every graph modification?

14:51:22 Bike yes. there are links in both directions so adding and deleting nodes isn't too hard.

14:51:53 beach And when and how is this information computed?

14:52:54 beach Or does AST-to-HIR use the same graph interface?

14:53:17 Bike ast-to-bir calls the function to initially compute the order, yeah.

14:53:27 beach Interesting.

14:54:35 Bike it can also be fully recomputed if there's some extensive modification. since the graph is traversed more often than modified, even doing a consing recomputation on modifications instead of traverses is still helpful for performance.

14:55:19 beach I see.

14:59:26 Bike haven't been using meters much since external profiling has been working pretty well for us, but i could add them back in

15:16:30 beach No need to rush. They can be put in lazily.