freenode/#clasp - IRC Chatlog
Search
17:03:38
drmeister
To begin with I'm still struggling with Figure 2 and Figure 3 that illustrate the Ullman algorithm.
17:04:27
drmeister
I like that this has nice pictures and a clear example - it makes me hopeful that I can figure this damn thing out and then find out how far my own implementation is from it.
17:05:55
drmeister
I have all of the code to do atom and bond matching. I generate a tree from smarts code. I have enough of the elements to find the first match.
17:07:13
drmeister
For instance, in the first box in Figure 2 - the '*' row are all 1 because every atom in heptanoic acid will match '*' (wildcard)
17:11:03
drmeister
From a pragmatic perspective - understanding Ullman doesn't appear necessary to understand VF2.
17:11:55
drmeister
But these highfalutin computer science professors waving their fancy graph theory around annoys me.
17:25:38
Bike
...yeah? that's what i mean. It's the matrix called M0 in section 2 of his paper, i believe
17:32:24
Bike
the M' matricies tell you which nodes could possibly match just by degree, and then you refine with the actual structure
17:33:54
Bike
right, an adjacency matrix encodes that information into a matrix instead of a graph data structure
17:34:55
drmeister
It's a matrix where every row and column represent an atom and a 1 at (i,j) represents there is a bond between atom i and atom j - right?
17:36:10
drmeister
Oh - yeah - I just don't see where it fits in here - other than - yes I need to know what is adjacent to what.
17:39:21
drmeister
Ah - I see where you are going with that. I'm jumping around the papers trying to find some insight while avoiding tackling Ullman's paper head on - it's nasty.
19:44:46
selwyn
drmeister: did you see this paper? linked to from the wiki page https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3633016/ i found it well-written and it has a summary of the state of the art. the authors propose an algorithm 'RI' which always outperforms vf2
21:57:05
selwyn
i still get 'out of memory errors' when building cclasp in parallel.. it looks like serial is working. does anyone else still have problems?
22:00:46
drmeister
When you are building in parallel you can control the number of parallel processes that are run at the same time.
23:25:47
selwyn
it worked thanks very much. first time i started it up it complained Compile-error The variable LITERAL::*CONSTANT-DATUM-TO-LITERAL-NODE-CREATOR* is unbound. now it's fine..
23:26:40
drmeister
I've been working with Martin all day on the distributor and fixing problems with the TI calculations - I think I got it working now.
23:27:09
drmeister
We are running calculations on a combination of AWS spot instances and my desktop GPU card. It's working nicely now.
23:28:19
drmeister
It's running all of the yellow ellipses - most of them are GPU accelerated Amber.
23:28:57
drmeister
So I can punt understanding the VF2 algorithm and just use the boost::graph implementation.
23:29:12
drmeister
There is a bunch of other useful stuff in boost::graph that I have wanted to use for a while.
23:34:05
selwyn
is there an obvious reason why building should require more memory after these recent changes?
23:37:28
drmeister
The parallelism uses 'fork' - so the main process is forking off children - they compile one source file and then die, taking their memory with them.
1:14:53
drmeister
I had some small errors in how some of the calculations were being done - they essentially were running backwards.