freenode/#sicl - IRC Chatlog
Search
1:31:13
no-defun-allowed
Another "quirk" of the linear probing hash table is that you can see when it has to probe multiple groups. With groups of 8 and a 0.8 load, there's a 16.7% probability of a group being full (so that GETHASH has to scan a second group), but with smaller groups, the probability decreases.
1:32:08
no-defun-allowed
Oh, smaller load, not groups. I was thinking about my next sentence, which is: This probability also decreases with larger groups, e.g. the 16-element groups which can be searched with 128-bit SIMD instructions.
4:17:44
no-defun-allowed
Is there much of a convention for example code in SICL? I ported the metadata probing code to use a made-up SIMD package, so that if one appears, implementing faster probing is just a matter of substituting function names; but it certainly won't work now, as that package doesn't exist.
4:19:46
no-defun-allowed
It is code that could be used, but won't work without some platform-specific modification. Maybe "example code" is the wrong term.
4:22:20
no-defun-allowed
I currently have a CL:ERROR form with a brief description at the start of the file, so that it won't load, and have named it "simd-metadata-table.example.lisp".
4:24:32
no-defun-allowed
I recall reading other codebases which turned a type .lisp into .example.lisp to denote that it was not meant to be used as is.
4:45:53
jeosol
no-defun-allowed: just saw your pull request on implement a linear-probing hash-table into SICL. Good work
8:38:35
beach
I am watching the presentation by Matt Kulukundis. I think a lot of the complications he talks about, like large keys and large objects, are problems that don't exist in a language with uniform reference semantics. Also, I think the tricks they invented are quite clever. Some of them may require some assembly code to be taken advantage of in a Common Lisp system.
8:48:18
no-defun-allowed
Yes, you can cheat the large object problem with references, when you have objects (like symbols and fixnums) which can be compared with EQ. But I kept fairly faithful to the design of the table (except that I keep hashes in the table, so that resizing doesn't have to recompute hashes), and the only unportable part I recall is probing using SIMD.
8:56:02
no-defun-allowed
But I would imagine a Common Lisp implementation could include a safe SIMD library (and I'd like to say "portable" too, I think LLVM gets away with portable vector code somehow). It wouldn't be hard, as the only operations that touch memory are loads and stores, which would just require bounds checking.
8:57:57
no-defun-allowed
...when the elements are integers or floats. I don't think anyone has seriously considered putting references to objects in vector registers.
10:21:47
jackdaniel
fun (subjective) observation: among all cl-related irc channels I'm in this is the most active
10:23:58
beach
no-defun-allowed: I didn't follow all the details in the talk. I'll probably watch it again. But, here is a question: would your implementation be particularly fast for the use case that is so common in Cleavir, namely 1. test (not in the table) 2. insert it?
10:24:32
beach
If so, we should suggest it to the Clasp people who seem to care about compilation time.
10:25:04
no-defun-allowed
I have not tested the performance of inserting yet, but GETHASH for keys which aren't in the table appears to be about as fast as for keys in the table.
10:27:56
no-defun-allowed
On average, a miss is about 3% faster than a miss, but that doesn't seem significant.
10:28:41
beach
I think this might be a common use case. For Cleavir, we traverse an instruction graph with relatively few back arcs. So most of the time, it is not going to be in the table.
10:28:48
no-defun-allowed
The list and bucket hash tables have slower misses than hits, but those are also insignificant results.
10:30:07
beach
Do you do some caching so that a gethash followed by a (setf gethash) with the same object does not do the search all over again/
10:32:20
beach
So here is an idea. Keep a bitmap indexed by hash values modulo its size, but say 10 times as many entries as in the table. It would act as a Bloom filter, so that if there is a 0, then the object is definitely not in the table.
10:33:06
beach
Then, instead of inserting immediately, accumulate objects sequentially until there is a hit in the table.
10:33:54
beach
Then, at the very least, it is possible to make a good guess of the size the table should be to accommodate the new objects.
10:37:42
no-defun-allowed
I have found the list-hash-table to be faster than both the bucket table and linear probing table, with fewer than 7 mappings. A miss would provoke worst-case behaviour, but it's still somehow faster.