libera/#sicl - IRC Chatlog
Search
7:12:51
beach
Here is an idea for finding a symbol with a particular name in the CL package: We compute the length of the name as pkhuong suggests. Then we do a "binary search" similar to what we do with fast generic dispatch. We branch so that around half of the different length of the names of the CL package are in one branch, and around half in the other.
7:12:57
beach
This is not the same as branching on half the max length, because there are very likely lengths that don't exist at all. This step will take around lb N steps where N is the number of different lengths.
7:13:07
beach
Then we can go from left to right in the candidate names, because we need to check every character in the input anyway. If there is a single character in the current position in the candidate names, we just check that it's the same as in the input.
7:13:08
beach
If there is a small number of possible letters, just do one or more IFs. If there is a large number of possibilities, then do a binary search on the character code of the input.
7:14:05
beach
I suspect that the number of cases where there are a lot of possible characters is small.
7:15:53
beach
With binary search, you always get lb M tests, so for binary search to pay off lb M must be smaller than M/2.
7:17:20
beach
The number of such cases can be determined by examining all CL symbols in each length category.
7:18:09
beach
Oh, and sequential test can be made faster by making the test for more likely symbols appear earlier.
7:23:30
beach
This smells like an intractable problem. But the domain is small, and the computation is done once and for all.
7:25:16
Mondenkind
the CL package has O(1024) symbols, so you need at least 10 tests for uniformly distributed symbols. As you say, you may do somewhat better on average by taking into account frequency, but it's still a branchfest that walks all over memory. And the branches are probably less predictable than for generic functions
7:26:12
Mondenkind
not to mention the second-order cost of branches (alias/evict good branches, mispredicts inhibit parallelisation with surrounding code) and memory accesses (alias/evict good memory)
7:26:24
beach
Sure, it is not cheap. The question is whether it is better than some other strategy like hashing.
7:29:11
Mondenkind
one thing which may be interesting is to store the character data directly inside of the hash table, up to some limit, to improve spatial locality
7:29:35
Mondenkind
so you get both the data and the symbol on the same cache line. Do that up to, say, 15 ascii characters
7:54:30
beach
Actually, a binary search must end with an equality test, so 1 + lb M must be smaller than M/2. So if my computation is right, that makes M > 8
7:58:38
beach
So if we take symbols with a name that has 6 characters in it (that's the category with the most number of symbols in it, namely 91), there are 18 different first letters which suggests a branch on the char-code. But then, it might be less advantageous to do a binary search for each branch.
8:00:01
Mondenkind
you could test multiple characters at once--both when binary and linearly searching--with swar tricks
8:01:27
Mondenkind
probably most advantageous with the latter, since it means you have to do fewer double-checks, but for the former, it allows for more precise cuts
12:40:32
mfiano
Hello beach and friends. How's SICL going? I asked about a week ago but I didn't notice any replies. Also I have a question relating to the wording the standard uses, I could use an experienced decipherer to see if I'm just not reading it correctly, but it seems to be inverting the semantics that it intends.
12:45:26
mfiano
I was away from following the usual chatter for the last 5 months or so, and that's too much too read in the backlog for a quick summary of what happened, what's happening now, and what plans to happen next.
13:28:38
mfiano
Any progress is progress I say, even if it's spending a few months in the design phase with little or no coding. That happens with large projects of mine anyway.
13:29:31
beach
I must have missed your previous question. I try to read all utterances and answer the ones that nobody else answered.
13:30:25
mfiano
fill pointer n. (of a vector) an integer associated with a vector that represents the index above which no elements are active.
13:31:50
mfiano
I am having a hard time wrapping my head around that definition. (make-array 10 :element-type 'fixnum :initial-contents '(0 1 2 3 4 5 6 7 8 9) :fill-pointer 2) will print as the active elements #(0 1). So that means the fill-pointer, 2, is above any _active_ elements, not above any non-active elements.