freenode/#clasp - IRC Chatlog
Search
15:04:30
Bike
https://github.com/clasp-developers/clasp/blob/master/src/core/hashTable.cc#L736-L739 am i reading this right that if it needs to rehash it actually prints something??
15:05:40
drmeister
Bike: That only happens if the hash-table runs out of slots - that should never happen.
15:07:42
drmeister
I put that in there when I was fixing the rehash-thresholds throughout clasp so that they were all less than 1.0 - then I just left it in.
15:09:51
drmeister
There is this... https://github.com/clasp-developers/clasp/blob/dev/src/core/hashTable.cc#L338
15:09:53
Bike
i think we're sort of ignoring some of the finer aspects of rehashing control, but we're allowed to so whatever
15:10:23
Bike
strictly speaking it could be a type error but we don't have to and it's kind of unlikely
15:10:34
beach
Other than the two elements represented in the vector itself rather than as a CONS, there is no difference.
15:12:37
Bike
not hasing twice would be good too, but hashing is just arithmetic so it probably doesn't matter relatively speaking
15:15:26
drmeister
https://github.com/clasp-developers/clasp/blob/dev/include/clasp/core/hashTable.h#L87
15:16:35
drmeister
The actual table is a Vec0<Cons_O> - it's not a simple-vector of Cons_sp cells like I said earlier. I forgot I made it a vector of Cons_O cells - so there is no allocation unless there is a rehash.
15:20:57
Bike
we still have a few issues like "[x] conses" and having an easy way to check that would be good
15:22:32
drmeister
(time (loop for x below 1000 do (cons 'a 'b))) -> Time real(0.000 secs) run(0.000 secs) consed(16000 bytes)
15:22:48
drmeister
(time (loop for x below 10000 do (cons 'a 'b))) . -> Time real(0.000 secs) run(0.000 secs) consed(160000 bytes)
15:47:37
Bike
make sure it can build slime and some quicklisp libraries. that's where i started hitting problems before.
15:48:41
beach
OK, so no consing. That's good. Do you have statistics about how many iterations are required on the average when adding an element and when searching for one?
16:10:25
Bike
https://github.com/clasp-developers/clasp/blob/dev/src/core/hashTable.cc#L630-L640 here, for the read
16:12:12
beach
OK, so if you look for something and you find such an empty marker, you have to iterate, right?
16:12:54
beach
So if the hash table is nearly empty, and you are looking for something that isn't there, you must iterate over half the elements on the average?
16:14:27
beach
Because an element can be anywhere, due to there being collisions when it was inserted.
16:14:30
Bike
well, there's also an unbound marker, and if the iteration hits that it returns not found.
16:14:56
drmeister
I haven't written a function to tell me that about a hash table - but I have visually inspected them - it's not bad. Here's an example:
16:16:04
Bike
which makes sense because you could have two values hash to the same number, and then delete the earlier one in the table
16:16:28
beach
Ah, so it is in fact when the table is close to full that searching for a non-existing item is expensive.
16:18:31
beach
It would be interesting to compare the number of iterations on the average for open hashing compared to hasing per bucket.
16:18:33
drmeister
What I'm not doing is caching the previous searched for entry as sbcl does - stassats says that is a good thing to do.
16:19:10
kpoeck
Perhaps you could have a lot at the benchmark and verify that nothing important is missing
16:19:10
beach
drmeister: Do you have reasons to believe that this would apply to the use case in Cleavir?
16:20:55
beach
kpoeck: Well, execution time depends on lots of things, like how good the compiler is.
16:20:56
Bike
https://github.com/robert-strandh/SICL/blob/master/Code/Cleavir/Intermediate-representation/map-instructions-arbitrary-order.lisp#L17-L18 bam.
16:21:03
drmeister
stassats indicated the use case where it was useful. Traversing graphs wasn't it.
16:22:39
Bike
i guess you could do both? if you find the item, cache its location. if you don't find the item, cache the first place it could go
16:22:52
drmeister
The (sys:next-number) returns a fixnum that is incremented atomically and guaranteed never to be zero.
16:25:19
beach
I am trying to figure out whether there is something that could be done to improve the performance of hash tables in Clasp, so that no special case is needed.
16:26:24
beach
It appears to me that, if no consing is done, the main performance issue is the number of iterations to find (or not find) an item being looked for.
16:26:55
beach
Such numbers could then be compared to what is expected for a hash table with buckets.
16:30:16
beach
Or, let me ask this: what prompted the decision to use open hashing rather than bucket hashing?
16:30:44
beach
Presumably, you either did some comparison, or you had a hunch that open hashing would be a better choice.
16:37:13
drmeister
Ideas: If we knew roughly how many instructions we are dealing with we could pass :size to make-hash-table in the map-instructions-xxx functions.
16:38:22
drmeister
I think the problem is that inlining is mapping over the instructions again and again and again. So hash-tables are ballooning over and over again.
16:39:17
Bike
we could have the enter-instruction or something keep a list of all the instructions in that function, and have all the instructions refer to the enter or the list, and then mapping with arbitrary order would just be a linear loop
16:40:06
drmeister
Or decide that iterating over graphs is something that the graphs need to be good at and add a 'touched' slot to each instruction - but deal with the extra complexity.
16:41:44
drmeister
Or at the very least - add :rehash-size 4.0 to map-instructions-xxx make-hash-table to rehash less often.
16:42:43
drmeister
Or keep improving the inlining algorithm to iterate less over the instruction graph - we have been doing very well with this - how much improvement is there to be had?
16:53:06
drmeister
Sure - and there is some overhead to remove in the single-dispatch-generic-function call - but where else?
16:53:36
Bike
we can see whether we're going through more iterations than we need to like beach was saying
16:55:35
drmeister
Oh - but visual inspection of the example I posted above shows that it's usually at the index and less often one away.
16:56:49
drmeister
In other news - I cannot get compile-file-parallel to fail on ASDF when I do it by hand. Grrrr.
17:32:38
kpoeck
Defining the hashtables in (map-instructions-* ) like (make-hash-table :test #'eq :size 64 :rehash-size 4.0) seems to bring a 10% improvement in compilation time
17:47:34
karlosz
im hoping that these changes will make the map-instruction changes unnecessary, since those big chunks of map instructions are mostly coming from those
17:52:07
karlosz
kpoeck: i have a patch which entirely removes the map-instructions-owner block you are looking at, so it might not be necessary to try to hard to speed it up
17:52:24
drmeister
Here is some more data on clasp's hash-tables - the average search length isn't too bad.
17:52:55
drmeister
I create a 10000 element hash-table and add 1000 elements keyed on fresh cons cells
17:55:38
drmeister
I don't think there are a lot of improvements to be had in improving the hash table in terms of search length.
17:56:32
karlosz
i just updated my pull request here: https://github.com/robert-strandh/SICL/pull/131. do you think you could get a flamegraph for the second set of patches as well? kpoeck
17:57:24
drmeister
Bike: Can I push my sicl changes to a sicl branch? I'd like to move them to my linux machine.
18:09:01
drmeister
I added a 'touched' slot to instructions and I changed the map-instructions-xxx functions.
18:12:37
karlosz
Bike: by the way, i did the function-dag incremental thing as an update to the pull request. i noticed you had done something similar before and reverted. just wanted to make sure i didn't run into the same pitfall. do you remember what the issue was before?
18:13:07
Bike
karlosz: i don't think i ever worked out what the exact problem was, but i hit issues compiling slime.
18:13:47
Bike
clasp dev uses the clasp-changes branch of my repo. the master branch of my repo is unused
18:14:27
Bike
i mean, clasp actually ignores the branches, the commit just needs to be available somewhere
18:17:29
Bike
but if you make a forced change to clasp-changes, or something like that, the commit won't be up any more and clasp configure won't be able to download sicl
18:37:21
karlosz
Bike: did you ever try incrementally doing compute-destinies? i think ive got something worked out for it but its a little messy. after that goes away then the main bars for map-instructions in the flamegraphs will go away
18:40:59
karlosz
but now that my previously 12 hour build takes less than 6 hours, i can spare a few more self builds now. less inclined to believe its timing unreliability :)
19:13:43
karlosz
drmeister: you mean the linux perf stuff? i have no idea. ive never used any of those types of tools that the os provides before
20:51:52
drmeister
As a baseline - I built clasp 'dev' using the default everything on my linux workstation in 124min
22:11:46
drmeister
Bike: what was that SSA-like form that we ran into recently, where instead of PHI-nodes you use jumps with arguments.
22:12:10
Bike
i don't think it had a name. it was part of the extended blocks used by that jit thing you were looking at.
22:49:42
drmeister
I just finished building with karlosz' latest PR on my linux workstation. The time dropped from 124m10s to 83m
22:52:54
kpoeck
yes, the CL_DEFMETHOD T_sp HashTable_O::hash_table_setf_gethash(T_sp key, T_sp value) seems to be relevant
23:03:54
selwyn
The symbol HASH-TABLE-SETF-GETHASH already has a function bound to it and it is not a SingleDispatchGenericFunction - it cannot become a SingleDispatchGenericFunction
23:06:10
kpoeck
CL_DEFUN T_mv cl__gethash(T_sp key, HashTableBase_sp hashTable, T_sp default_value) { return hashTable->gethash(key, default_value); };
23:19:17
karlosz
are you all looking at the gethash call in compute-destinies? i plan to get rid of the bar entirely so it may be worthwhile to see if it is still expensive after that chunk is gone
23:27:58
drmeister
karlosz: We are looking at the (setf gethash) call in compute-destinies - if you get rid of that bar that would be great. Then it won't depend on hash-table's like it was. But it's still a good idea to get rid of overhead for (setf gethash).
23:30:14
drmeister
I am running into a problem though. When I try to load quicklisp/setup.lisp with your latest PR I get an error.
23:36:48
karlosz
but it certainly *looks* like an inlining problem. maybe you have some merge damage?
23:42:25
drmeister
git remote add karlosz http://github.com/karlosz/SICL.git ; git pull karlosz rui-v2
23:46:34
drmeister
The only other change is I (setf *compile-file-parallel* nil) in compile-file-parallel.lsp
23:55:05
karlosz
but without any of my patches, quicklisp wouldn't load in a reasonable amount of time
1:11:19
drmeister
The change to (setf gethash) is not perceptible. It built in 83min previously and now 82m10s
1:14:39
drmeister
Hmm, I'm not sure if that is in both runs or just this one. What is definitely just in the latest is converting hash-table-setf-gethash from a CL_DEFMETHOD to a CL_DEFUN
1:26:21
karlosz
drmeister: what does the backtrace look like? i have never seen an issue with asdf, but i also have a 3 day old clasp
2:07:46
karlosz
drmeister: did the build start failing after you pulled clasp? Bike made some inline.lisp changes yesterday that look relevant
2:10:46
karlosz
might want to try building with commit 727f6da7779a1ec4a4b330410e64ee04f4cfdae7 and see if that fixes things, since i don't have and i suspect kpoeck doesnt have the absolute latest clasp