freenode/#sicl - IRC Chatlog
Search
8:38:35
beach
I am watching the presentation by Matt Kulukundis. I think a lot of the complications he talks about, like large keys and large objects, are problems that don't exist in a language with uniform reference semantics. Also, I think the tricks they invented are quite clever. Some of them may require some assembly code to be taken advantage of in a Common Lisp system.
8:48:18
no-defun-allowed
Yes, you can cheat the large object problem with references, when you have objects (like symbols and fixnums) which can be compared with EQ. But I kept fairly faithful to the design of the table (except that I keep hashes in the table, so that resizing doesn't have to recompute hashes), and the only unportable part I recall is probing using SIMD.
8:56:02
no-defun-allowed
But I would imagine a Common Lisp implementation could include a safe SIMD library (and I'd like to say "portable" too, I think LLVM gets away with portable vector code somehow). It wouldn't be hard, as the only operations that touch memory are loads and stores, which would just require bounds checking.
8:57:57
no-defun-allowed
...when the elements are integers or floats. I don't think anyone has seriously considered putting references to objects in vector registers.
10:21:47
jackdaniel
fun (subjective) observation: among all cl-related irc channels I'm in this is the most active
10:23:58
beach
no-defun-allowed: I didn't follow all the details in the talk. I'll probably watch it again. But, here is a question: would your implementation be particularly fast for the use case that is so common in Cleavir, namely 1. test (not in the table) 2. insert it?
10:24:32
beach
If so, we should suggest it to the Clasp people who seem to care about compilation time.
10:25:04
no-defun-allowed
I have not tested the performance of inserting yet, but GETHASH for keys which aren't in the table appears to be about as fast as for keys in the table.
10:27:56
no-defun-allowed
On average, a miss is about 3% faster than a miss, but that doesn't seem significant.
10:28:41
beach
I think this might be a common use case. For Cleavir, we traverse an instruction graph with relatively few back arcs. So most of the time, it is not going to be in the table.
10:28:48
no-defun-allowed
The list and bucket hash tables have slower misses than hits, but those are also insignificant results.
10:30:07
beach
Do you do some caching so that a gethash followed by a (setf gethash) with the same object does not do the search all over again/
10:32:20
beach
So here is an idea. Keep a bitmap indexed by hash values modulo its size, but say 10 times as many entries as in the table. It would act as a Bloom filter, so that if there is a 0, then the object is definitely not in the table.
10:33:06
beach
Then, instead of inserting immediately, accumulate objects sequentially until there is a hit in the table.
10:33:54
beach
Then, at the very least, it is possible to make a good guess of the size the table should be to accommodate the new objects.
10:37:42
no-defun-allowed
I have found the list-hash-table to be faster than both the bucket table and linear probing table, with fewer than 7 mappings. A miss would provoke worst-case behaviour, but it's still somehow faster.
10:43:26
no-defun-allowed
Now I'm thinking about how similar a Bloom filter and the metadata table are in purpose. (They are evidently different in construction, but they both are used to handle misses quickly.)
10:44:55
no-defun-allowed
Though, isn't there an often repeated observation that a list is a faster set than a hash table for, say, fewer than 20 elements? There is a constant overhead due to using generic function on SBCL, so the crossover point is lower.
10:46:28
beach
When the hash table is created, allocate a bitmap of some reasonable size that is significantly smaller than the overhead of the hash table itself, say 8 full words, so 512 bits.
10:47:23
beach
For gethash, first take the hash value modulo the size of the bitmap. If the bit is cleared, return NIL.
10:48:58
beach
For (setf gethash), either remember the latest hash value or do the test again. If it is a new object, accumulated it in a vector, I guess adjustable.
10:49:51
beach
When the bit test returns 1, check how many objects have accumulated, and resize the table accordingly. Then insert all the existing elements plus the accumulated ones.
10:50:32
beach
Also, adjust the size of the bitmap to reflect the current number of elements, and set the bits for the existing elements.
10:52:37
beach
It's the Boehm collector that doesn't handle the frequent reallocation of a bigger table.
10:56:25
no-defun-allowed
It sounds like the metadata table could be used for this, but I can't work out how yet.
10:57:53
beach
Yes, that's what inspired me. So let's put this idea on the back burner but keep it in mind.
11:00:53
no-defun-allowed
But I'm also not sure how the bitmap allows for estimating hash table sizes. We count the number of keys used between some given key being used, and then reused (ignoring collisions in the bitmap)?
11:20:10
beach
No, the total number of existing elements, plus the new accumulated ones will give the estimate.
11:21:03
no-defun-allowed
Ah right. Don't we resize the adjustable vector with accumulated objects using a similar scheme to the hash table?
11:21:12
beach
I should put a meter in the Cleavir code so that we can see how many nodes are typically processed.
11:21:48
beach
Yes, that's what I meant by the adjustable vector being a problem. But at least no probing is required.
11:35:18
no-defun-allowed
Well, when you insert the elements at the end of measurement, you still probe as well.
12:36:41
heisig
Speaking of hash tables - I always wondered why hash tables with less than about 15 elements aren't implemented as an alist.
12:37:24
heisig
I occasionally write my own hash table wrapper that does exactly that, and it is measurably faster in many cases.
13:37:12
pjb
heisig: note the actual threshold count is implementation and platform dependent (and of course, also dependent on the test and hash functions (not sxhash!)). You would need to benchmark it at the start of the program.
13:37:58
pjb
heisig: I've measured as low as 5 elements with sbcl and as high as 35 elements with clisp, on the platform where I tested it.
14:05:08
Bike
for the record the hash table growth thing isn't so much a problem in clasp any more. BIR has some higher structure that makes it traversable without consing.
14:07:14
Bike
and ASTs have an actual tree structure now, so you don't need to maintain a seen table.
14:07:45
Bike
(the only cases ASTs weren't trees had to do with lexical variables and blocks - nothing too fundamental)
14:48:13
beach
The average size for an invocation is just 73 nodes. But then, I also have many small source files.
14:48:19
Bike
there's not an exactly analogous operation to m-i-a-o. but the way traversals work now is just following "next" slots, so overhead is pretty low.
14:48:42
Bike
10% of execution time even with no inlining sounds like it could end up being problematic.
14:49:54
Bike
in more detail, how it works in BIR is that reverse postorder is computed and cached in the nodes, so iteration just needs to go through it. so traversal doesn't cons, and it also allows multiple traversals to go simultaneously (unlike if there was a mark slot in the node)
14:51:04
beach
Sounds good. But that order has to be computed once, right? And then maintained for every graph modification?
14:54:35
Bike
it can also be fully recomputed if there's some extensive modification. since the graph is traversed more often than modified, even doing a consing recomputation on modifications instead of traverses is still helpful for performance.
14:59:26
Bike
haven't been using meters much since external profiling has been working pretty well for us, but i could add them back in