libera/#sicl - IRC Chatlog
Search
1:19:02
Bike
other (spun-off) parts are used in other projects, e.g. staple uses eclector-cst for marking up code.
1:19:48
Bike
and yes, cleavir has a million asds because the systems are independently loadable/not loadable.
1:29:23
Bike
for example someone might want to parse code into ASTs to do syntactic analysis, but isn't interested in the later IR stage.
7:12:51
beach
Here is an idea for finding a symbol with a particular name in the CL package: We compute the length of the name as pkhuong suggests. Then we do a "binary search" similar to what we do with fast generic dispatch. We branch so that around half of the different length of the names of the CL package are in one branch, and around half in the other.
7:12:57
beach
This is not the same as branching on half the max length, because there are very likely lengths that don't exist at all. This step will take around lb N steps where N is the number of different lengths.
7:13:07
beach
Then we can go from left to right in the candidate names, because we need to check every character in the input anyway. If there is a single character in the current position in the candidate names, we just check that it's the same as in the input.
7:13:08
beach
If there is a small number of possible letters, just do one or more IFs. If there is a large number of possibilities, then do a binary search on the character code of the input.
7:14:05
beach
I suspect that the number of cases where there are a lot of possible characters is small.
7:15:53
beach
With binary search, you always get lb M tests, so for binary search to pay off lb M must be smaller than M/2.
7:17:20
beach
The number of such cases can be determined by examining all CL symbols in each length category.
7:18:09
beach
Oh, and sequential test can be made faster by making the test for more likely symbols appear earlier.
7:23:30
beach
This smells like an intractable problem. But the domain is small, and the computation is done once and for all.
7:25:16
Mondenkind
the CL package has O(1024) symbols, so you need at least 10 tests for uniformly distributed symbols. As you say, you may do somewhat better on average by taking into account frequency, but it's still a branchfest that walks all over memory. And the branches are probably less predictable than for generic functions
7:26:12
Mondenkind
not to mention the second-order cost of branches (alias/evict good branches, mispredicts inhibit parallelisation with surrounding code) and memory accesses (alias/evict good memory)
7:26:24
beach
Sure, it is not cheap. The question is whether it is better than some other strategy like hashing.
7:29:11
Mondenkind
one thing which may be interesting is to store the character data directly inside of the hash table, up to some limit, to improve spatial locality
7:29:35
Mondenkind
so you get both the data and the symbol on the same cache line. Do that up to, say, 15 ascii characters
7:54:30
beach
Actually, a binary search must end with an equality test, so 1 + lb M must be smaller than M/2. So if my computation is right, that makes M > 8
7:58:38
beach
So if we take symbols with a name that has 6 characters in it (that's the category with the most number of symbols in it, namely 91), there are 18 different first letters which suggests a branch on the char-code. But then, it might be less advantageous to do a binary search for each branch.
8:00:01
Mondenkind
you could test multiple characters at once--both when binary and linearly searching--with swar tricks
8:01:27
Mondenkind
probably most advantageous with the latter, since it means you have to do fewer double-checks, but for the former, it allows for more precise cuts