freenode/#sicl - IRC Chatlog
Search
4:17:44
no-defun-allowed
Is there much of a convention for example code in SICL? I ported the metadata probing code to use a made-up SIMD package, so that if one appears, implementing faster probing is just a matter of substituting function names; but it certainly won't work now, as that package doesn't exist.
4:19:46
no-defun-allowed
It is code that could be used, but won't work without some platform-specific modification. Maybe "example code" is the wrong term.
4:22:20
no-defun-allowed
I currently have a CL:ERROR form with a brief description at the start of the file, so that it won't load, and have named it "simd-metadata-table.example.lisp".
4:24:32
no-defun-allowed
I recall reading other codebases which turned a type .lisp into .example.lisp to denote that it was not meant to be used as is.
4:45:53
jeosol
no-defun-allowed: just saw your pull request on implement a linear-probing hash-table into SICL. Good work
8:38:35
beach
I am watching the presentation by Matt Kulukundis. I think a lot of the complications he talks about, like large keys and large objects, are problems that don't exist in a language with uniform reference semantics. Also, I think the tricks they invented are quite clever. Some of them may require some assembly code to be taken advantage of in a Common Lisp system.
8:48:18
no-defun-allowed
Yes, you can cheat the large object problem with references, when you have objects (like symbols and fixnums) which can be compared with EQ. But I kept fairly faithful to the design of the table (except that I keep hashes in the table, so that resizing doesn't have to recompute hashes), and the only unportable part I recall is probing using SIMD.
8:56:02
no-defun-allowed
But I would imagine a Common Lisp implementation could include a safe SIMD library (and I'd like to say "portable" too, I think LLVM gets away with portable vector code somehow). It wouldn't be hard, as the only operations that touch memory are loads and stores, which would just require bounds checking.
8:57:57
no-defun-allowed
...when the elements are integers or floats. I don't think anyone has seriously considered putting references to objects in vector registers.
10:21:47
jackdaniel
fun (subjective) observation: among all cl-related irc channels I'm in this is the most active
10:23:58
beach
no-defun-allowed: I didn't follow all the details in the talk. I'll probably watch it again. But, here is a question: would your implementation be particularly fast for the use case that is so common in Cleavir, namely 1. test (not in the table) 2. insert it?
10:24:32
beach
If so, we should suggest it to the Clasp people who seem to care about compilation time.
10:25:04
no-defun-allowed
I have not tested the performance of inserting yet, but GETHASH for keys which aren't in the table appears to be about as fast as for keys in the table.
10:27:56
no-defun-allowed
On average, a miss is about 3% faster than a miss, but that doesn't seem significant.
10:28:41
beach
I think this might be a common use case. For Cleavir, we traverse an instruction graph with relatively few back arcs. So most of the time, it is not going to be in the table.
10:28:48
no-defun-allowed
The list and bucket hash tables have slower misses than hits, but those are also insignificant results.
10:30:07
beach
Do you do some caching so that a gethash followed by a (setf gethash) with the same object does not do the search all over again/
10:32:20
beach
So here is an idea. Keep a bitmap indexed by hash values modulo its size, but say 10 times as many entries as in the table. It would act as a Bloom filter, so that if there is a 0, then the object is definitely not in the table.
10:33:06
beach
Then, instead of inserting immediately, accumulate objects sequentially until there is a hit in the table.
10:33:54
beach
Then, at the very least, it is possible to make a good guess of the size the table should be to accommodate the new objects.
10:37:42
no-defun-allowed
I have found the list-hash-table to be faster than both the bucket table and linear probing table, with fewer than 7 mappings. A miss would provoke worst-case behaviour, but it's still somehow faster.
10:43:26
no-defun-allowed
Now I'm thinking about how similar a Bloom filter and the metadata table are in purpose. (They are evidently different in construction, but they both are used to handle misses quickly.)
10:44:55
no-defun-allowed
Though, isn't there an often repeated observation that a list is a faster set than a hash table for, say, fewer than 20 elements? There is a constant overhead due to using generic function on SBCL, so the crossover point is lower.
10:46:28
beach
When the hash table is created, allocate a bitmap of some reasonable size that is significantly smaller than the overhead of the hash table itself, say 8 full words, so 512 bits.
10:47:23
beach
For gethash, first take the hash value modulo the size of the bitmap. If the bit is cleared, return NIL.
10:48:58
beach
For (setf gethash), either remember the latest hash value or do the test again. If it is a new object, accumulated it in a vector, I guess adjustable.
10:49:51
beach
When the bit test returns 1, check how many objects have accumulated, and resize the table accordingly. Then insert all the existing elements plus the accumulated ones.
10:50:32
beach
Also, adjust the size of the bitmap to reflect the current number of elements, and set the bits for the existing elements.
10:52:37
beach
It's the Boehm collector that doesn't handle the frequent reallocation of a bigger table.
10:56:25
no-defun-allowed
It sounds like the metadata table could be used for this, but I can't work out how yet.
10:57:53
beach
Yes, that's what inspired me. So let's put this idea on the back burner but keep it in mind.
11:00:53
no-defun-allowed
But I'm also not sure how the bitmap allows for estimating hash table sizes. We count the number of keys used between some given key being used, and then reused (ignoring collisions in the bitmap)?
11:20:10
beach
No, the total number of existing elements, plus the new accumulated ones will give the estimate.
11:21:03
no-defun-allowed
Ah right. Don't we resize the adjustable vector with accumulated objects using a similar scheme to the hash table?
11:21:12
beach
I should put a meter in the Cleavir code so that we can see how many nodes are typically processed.
11:21:48
beach
Yes, that's what I meant by the adjustable vector being a problem. But at least no probing is required.
11:35:18
no-defun-allowed
Well, when you insert the elements at the end of measurement, you still probe as well.
12:36:41
heisig
Speaking of hash tables - I always wondered why hash tables with less than about 15 elements aren't implemented as an alist.
12:37:24
heisig
I occasionally write my own hash table wrapper that does exactly that, and it is measurably faster in many cases.
13:37:12
pjb
heisig: note the actual threshold count is implementation and platform dependent (and of course, also dependent on the test and hash functions (not sxhash!)). You would need to benchmark it at the start of the program.
13:37:58
pjb
heisig: I've measured as low as 5 elements with sbcl and as high as 35 elements with clisp, on the platform where I tested it.