freenode/#clasp - IRC Chatlog
Search
15:46:27
drmeister
Hmm, right now the contains the state of the search. I need to get the state of the search into a vector that stores the values that are currently on the stack.
15:55:29
drmeister
So test #2 starting on N2 should return N2C4C5, N2C6C7 and N2C8C9 and starting on N3 return N3C2C3 and everything else should fail
15:56:21
drmeister
Right now, test#2 on N3 will return N3C2C3 and on N2 will arbitrarily return one of the solutions N2C4C5, N2C6C7 or N2C8C9 and false starting on any other atom.
15:58:07
drmeister
Not that you need to look at it but the code that does this is in chemInfo.cc https://github.com/drmeister/cando/blob/dev/src/chem/chemInfo.cc#L1558 and https://github.com/drmeister/cando/blob/dev/src/chem/chemInfo.cc#L1621
15:58:37
drmeister
Then I have these papers that sound like gobbledy gook and I can't make sense of their algorithms.
16:47:04
drmeister
http://depth-first.com/articles/2008/11/13/one-of-these-things-is-not-like-the-other/
16:47:33
drmeister
https://stackoverflow.com/questions/6743894/any-working-example-of-vf2-algorithm/6744603#6744603
16:51:11
Bike
"There's just one problem: the Ullmann algorithm detects edge-induced isomporphisms. This means, for example, that if your query molecule is propane and your test molecule is cyclopropane, you won't find a match with an Ullmann-backed tool. " why is that bad
16:52:50
drmeister
I'm wracking my brains why it would be bad not to match cyclopropane (C1CCC1) given propane (CCC)
16:53:51
Bike
i mean if you just want a carbon bonded to a carbon bonded to a carbon you should be able to specify that too but propane is more specific
16:54:40
drmeister
The VF2 algorithm is described here - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.5342&rep=rep1&type=pdf
16:55:08
drmeister
To map it into what I know of the problem. I recognize a lot of it but I still don't get it (sigh).
16:58:46
Bike
well the high level description looks pretty much like what cando already has, except it appends solutions into a list instead of returning immediately.
17:03:38
drmeister
To begin with I'm still struggling with Figure 2 and Figure 3 that illustrate the Ullman algorithm.
17:04:27
drmeister
I like that this has nice pictures and a clear example - it makes me hopeful that I can figure this damn thing out and then find out how far my own implementation is from it.
17:05:55
drmeister
I have all of the code to do atom and bond matching. I generate a tree from smarts code. I have enough of the elements to find the first match.
17:07:13
drmeister
For instance, in the first box in Figure 2 - the '*' row are all 1 because every atom in heptanoic acid will match '*' (wildcard)
17:11:03
drmeister
From a pragmatic perspective - understanding Ullman doesn't appear necessary to understand VF2.
17:11:55
drmeister
But these highfalutin computer science professors waving their fancy graph theory around annoys me.
17:25:38
Bike
...yeah? that's what i mean. It's the matrix called M0 in section 2 of his paper, i believe
17:32:24
Bike
the M' matricies tell you which nodes could possibly match just by degree, and then you refine with the actual structure
17:33:54
Bike
right, an adjacency matrix encodes that information into a matrix instead of a graph data structure
17:34:55
drmeister
It's a matrix where every row and column represent an atom and a 1 at (i,j) represents there is a bond between atom i and atom j - right?
17:36:10
drmeister
Oh - yeah - I just don't see where it fits in here - other than - yes I need to know what is adjacent to what.
17:39:21
drmeister
Ah - I see where you are going with that. I'm jumping around the papers trying to find some insight while avoiding tackling Ullman's paper head on - it's nasty.
19:44:46
selwyn
drmeister: did you see this paper? linked to from the wiki page https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3633016/ i found it well-written and it has a summary of the state of the art. the authors propose an algorithm 'RI' which always outperforms vf2
21:57:05
selwyn
i still get 'out of memory errors' when building cclasp in parallel.. it looks like serial is working. does anyone else still have problems?
22:00:46
drmeister
When you are building in parallel you can control the number of parallel processes that are run at the same time.
23:25:47
selwyn
it worked thanks very much. first time i started it up it complained Compile-error The variable LITERAL::*CONSTANT-DATUM-TO-LITERAL-NODE-CREATOR* is unbound. now it's fine..
23:26:40
drmeister
I've been working with Martin all day on the distributor and fixing problems with the TI calculations - I think I got it working now.
23:27:09
drmeister
We are running calculations on a combination of AWS spot instances and my desktop GPU card. It's working nicely now.
23:28:19
drmeister
It's running all of the yellow ellipses - most of them are GPU accelerated Amber.
23:28:57
drmeister
So I can punt understanding the VF2 algorithm and just use the boost::graph implementation.
23:29:12
drmeister
There is a bunch of other useful stuff in boost::graph that I have wanted to use for a while.
23:34:05
selwyn
is there an obvious reason why building should require more memory after these recent changes?
23:37:28
drmeister
The parallelism uses 'fork' - so the main process is forking off children - they compile one source file and then die, taking their memory with them.
1:14:53
drmeister
I had some small errors in how some of the calculations were being done - they essentially were running backwards.