freenode/#clasp - IRC Chatlog

15:46:27 drmeister Hmm, right now the contains the state of the search. I need to get the state of the search into a vector that stores the values that are currently on the stack.

15:46:50 drmeister Hmm, right now the STACK contains the state of the seach.

15:46:59 drmeister Damn this keyboard

15:48:42 drmeister I'm going to focus on chains first - they are simpler.

15:48:55 drmeister If I had the chain NCC

15:53:10 drmeister https://usercontent.irccloud-cdn.com/file/eshrXXFz/image.png

15:54:19 drmeister I switched to S-expressions, rather than mathematical notation.

15:54:26 drmeister ACTION hates commas

15:55:29 drmeister So test #2 starting on N2 should return N2C4C5, N2C6C7 and N2C8C9 and starting on N3 return N3C2C3 and everything else should fail

15:56:21 drmeister Right now, test#2 on N3 will return N3C2C3 and on N2 will arbitrarily return one of the solutions N2C4C5, N2C6C7 or N2C8C9 and false starting on any other atom.

15:58:07 drmeister Not that you need to look at it but the code that does this is in chemInfo.cc https://github.com/drmeister/cando/blob/dev/src/chem/chemInfo.cc#L1558 and https://github.com/drmeister/cando/blob/dev/src/chem/chemInfo.cc#L1621

15:58:37 drmeister Then I have these papers that sound like gobbledy gook and I can't make sense of their algorithms.

16:00:40 drmeister Ha - I just found one with pictures - that looks interesting.

16:00:55 drmeister https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586954/#B12

16:04:03 drmeister I'm pretty sure I'm doing a brute force, depth first search

16:47:02 drmeister The "VF2" algorithm seems to be the best.

16:47:04 drmeister http://depth-first.com/articles/2008/11/13/one-of-these-things-is-not-like-the-other/

16:47:16 drmeister https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586954/#B12

16:47:33 drmeister https://stackoverflow.com/questions/6743894/any-working-example-of-vf2-algorithm/6744603#6744603

16:49:14 drmeister These descriptions are all so shitty.

16:51:11 Bike "There's just one problem: the Ullmann algorithm detects edge-induced isomporphisms. This means, for example, that if your query molecule is propane and your test molecule is cyclopropane, you won't find a match with an Ullmann-backed tool. " why is that bad

16:51:23 Bike they're not isomorphic

16:52:50 drmeister I'm wracking my brains why it would be bad not to match cyclopropane (C1CCC1) given propane (CCC)

16:53:05 drmeister I can't come up with a reason.

16:53:51 Bike i mean if you just want a carbon bonded to a carbon bonded to a carbon you should be able to specify that too but propane is more specific

16:53:57 drmeister I focused on the Ehrlman paper that says that VF2 is faster than Ullman

16:54:40 drmeister The VF2 algorithm is described here - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.5342&rep=rep1&type=pdf

16:54:47 drmeister I'm trying to make heads or tails of it.

16:55:08 drmeister To map it into what I know of the problem. I recognize a lot of it but I still don't get it (sigh).

16:55:14 Bike oh, this isn't well written.

16:56:19 drmeister https://www.youtube.com/watch?v=tO5sxLapAts

16:56:39 Bike there are typos, even

16:58:46 Bike well the high level description looks pretty much like what cando already has, except it appends solutions into a list instead of returning immediately.

16:59:12 Bike the feasibility function and P might be different, i guess

16:59:23 drmeister I'm getting a lot more out of this paper that compares Ullman to VF2

16:59:24 drmeister https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586954/#B12

16:59:32 drmeister I don't know if you can see the full test and pictures.

16:59:49 Bike pubmed is full text.

16:59:55 drmeister Cool

17:00:04 Bike hooray for the federal government

17:03:38 drmeister To begin with I'm still struggling with Figure 2 and Figure 3 that illustrate the Ullman algorithm.

17:04:27 drmeister I like that this has nice pictures and a clear example - it makes me hopeful that I can figure this damn thing out and then find out how far my own implementation is from it.

17:04:44 Bike does cando's implementation involve a bit matrix?

17:04:50 drmeister No

17:04:55 Bike probably pretty far then.

17:05:10 drmeister VF2 doesn't appear to use a bit matrix either.

17:05:49 Bike the pictures are nice but this isn't a full explanation of either algorithm.

17:05:55 drmeister I have all of the code to do atom and bond matching. I generate a tree from smarts code. I have enough of the elements to find the first match.

17:06:24 drmeister Agreed - I'm crawling towards understanding.

17:06:50 Bike the ullman paper is 1976 and ACM which makes it annoying to get

17:07:13 drmeister For instance, in the first box in Figure 2 - the '*' row are all 1 because every atom in heptanoic acid will match '*' (wildcard)

17:07:14 Bike it doesn't seem to involve anything like a first atom, though

17:08:30 drmeister Ullman's paper is bleh (IMHO)

17:09:33 drmeister Pseudo code like this:

17:09:34 drmeister https://usercontent.irccloud-cdn.com/file/CbeJM3Pm/image.png

17:09:47 Bike 70s computer science, woo

17:09:50 drmeister And useful figures like this...

17:10:00 drmeister https://usercontent.irccloud-cdn.com/file/uNAsq6mb/image.png

17:10:09 Bike oh, at least it's actually specifically chemically oriented

17:11:03 drmeister From a pragmatic perspective - understanding Ullman doesn't appear necessary to understand VF2.

17:11:15 Bike maybe not

17:11:55 drmeister But these highfalutin computer science professors waving their fancy graph theory around annoys me.

17:12:16 Bike well, you'll probably have to figure it out, given that it is the actual problem

17:12:24 Bike do you know stuff like what an adjacency matrix is?

17:23:19 drmeister I think so. I'm asking myself "how would I create the first panel here...

17:23:29 drmeister https://usercontent.irccloud-cdn.com/file/n0YBh1zV/image.png

17:23:48 Bike well that's something different

17:23:50 Bike that's the M' matrix in the paper

17:24:19 Bike or well, M0 i think.

17:25:04 drmeister Well, this is Ullman we are talking about - does it have M0

17:25:38 Bike ...yeah? that's what i mean. It's the matrix called M0 in section 2 of his paper, i believe

17:27:12 drmeister F*ck that paper is hard to read.

17:32:24 Bike the M' matricies tell you which nodes could possibly match just by degree, and then you refine with the actual structure

17:32:27 Bike i think

17:33:09 drmeister What I have are bonds on atoms - I know from that what is adjacent to what.

17:33:54 Bike right, an adjacency matrix encodes that information into a matrix instead of a graph data structure

17:33:58 Bike that's different from the M' matrix though

17:34:55 drmeister It's a matrix where every row and column represent an atom and a 1 at (i,j) represents there is a bond between atom i and atom j - right?

17:35:01 Bike yeah.

17:35:15 drmeister I can build one - no problem.

17:35:41 Bike sure. i was just wondering if you were familiar with fundamental concepts like that.

17:36:10 drmeister Oh - yeah - I just don't see where it fits in here - other than - yes I need to know what is adjacent to what.

17:36:34 Bike just cos it's one of the things ullman talks about without explaining.

17:39:21 drmeister Ah - I see where you are going with that. I'm jumping around the papers trying to find some insight while avoiding tackling Ullman's paper head on - it's nasty.

17:44:33 drmeister Ullman has a much more recent paper on the algorithm

18:06:57 drmeister Hmm, there is a vf2 implementation in boost::graph

19:24:26 selwyn hi all

19:44:46 selwyn drmeister: did you see this paper? linked to from the wiki page https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3633016/ i found it well-written and it has a summary of the state of the art. the authors propose an algorithm 'RI' which always outperforms vf2

19:45:53 selwyn they maintain an implementation here https://github.com/InfOmics/RI-DS

21:57:05 selwyn i still get 'out of memory errors' when building cclasp in parallel.. it looks like serial is working. does anyone else still have problems?

21:57:26 drmeister How much memory do you have?

21:57:32 drmeister 16GB?

21:58:09 selwyn yes 16gb

22:00:46 drmeister When you are building in parallel you can control the number of parallel processes that are run at the same time.

22:00:58 drmeister Try ./waf build_cboehm -j4

22:01:07 drmeister It may need a space between j and 4

22:01:17 selwyn ah had not thought of building with fewer *sigh*

23:25:47 selwyn it worked thanks very much. first time i started it up it complained Compile-error The variable LITERAL::*CONSTANT-DATUM-TO-LITERAL-NODE-CREATOR* is unbound. now it's fine..

23:26:40 drmeister I've been working with Martin all day on the distributor and fixing problems with the TI calculations - I think I got it working now.

23:27:09 drmeister We are running calculations on a combination of AWS spot instances and my desktop GPU card. It's working nicely now.

23:27:29 drmeister It involves 132 GPU accelerated jobs to run one calculation.

23:27:52 drmeister https://usercontent.irccloud-cdn.com/file/CvMA0ZvB/graph.dot.pdf

23:28:19 drmeister It's running all of the yellow ellipses - most of them are GPU accelerated Amber.

23:28:39 drmeister I also got boost::graph hooked into Cando.

23:28:57 drmeister So I can punt understanding the VF2 algorithm and just use the boost::graph implementation.

23:29:12 drmeister There is a bunch of other useful stuff in boost::graph that I have wanted to use for a while.

23:32:58 selwyn the cl-vulkan example works!

23:33:28 drmeister Cool!

23:34:05 selwyn is there an obvious reason why building should require more memory after these recent changes?

23:34:19 drmeister No.

23:34:29 drmeister How recent?

23:35:16 selwyn um one week or so, since the introduction of ast-interpreter

23:35:32 drmeister It's hard to say.

23:35:44 drmeister The boehm GC doesn't give back memory to the OS.

23:36:51 drmeister Bah - I don't want to speculate. Watch with top and see what happens.

23:37:28 drmeister The parallelism uses 'fork' - so the main process is forking off children - they compile one source file and then die, taking their memory with them.

23:41:19 selwyn hm. i am going to bed now, will have to wait another day to try it out

23:43:16 drmeister Ok, good night.

1:14:29 drmeister We got the "right" answer for the free energy perturbation calculation.

1:14:53 drmeister I had some small errors in how some of the calculations were being done - they essentially were running backwards.

1:15:13 drmeister usha: Are you online?

3:42:18 beach Good morning everyone!