freenode/#clasp - IRC Chatlog
Search
17:18:36
drmeister
Bike: I've been thinking about the fep-benchmarks - we have a collection of molecules like those in that picture that I posted where every pair of molecules shares a substructure with a few differences.
17:19:32
drmeister
I exposed this boost::graph function: https://www.boost.org/doc/libs/1_54_0/libs/graph/doc/mcgregor_common_subgraphs.html
17:20:13
drmeister
But it's useless - the time the algorithm takes goes up really fast with the number of atoms. It can't handle more than 14 despite my attempts to improve the situation.
17:21:30
drmeister
Then I added a criterion where the vertices being compared need to be closer to each other in 3D space (atoms have positions and common atoms are very close to each other in space) - but that didn't help.
17:32:23
karlosz
drmeister: is it possible to reduce the size of the input graphs by collapsing vertices which are already known to be equivalent enough (ie by euclidean metric) and run the algorithm on that?
17:34:30
drmeister
karlosz: That's an interesting idea - how would that work? The algorithm seems to blow up at 14 vertices. The example that I'm testing it with has 22 vertices.
17:36:31
karlosz
you mentioned a distance criterion - if you can partition the vertices into equivalence classes based on distance, then you can reduce the number of vertices you run the algoirthm on
17:37:16
karlosz
if you know for sure that things that are too far away can't influence the common subgraph algorithm
18:12:14
drmeister
This is two of the molecules superimposed. It's a bit difficult to see but the central three rings are identical and the peripheral ring on the left and the t-butyl/methyl on the right are differences.
18:12:57
drmeister
The Mcgregor algorithm calls a callback with a list of corresponding pairs of vertices.
18:13:38
drmeister
Currently, I'm using the element for each atom, the bond order of each bond and the distance between each pair of atoms being compared as a criterion for equivalence.
18:14:17
drmeister
The Mcgregor algorithm first calls the callback with 3 vertex subgraphs, then 4, then 5, then 6 and so on.
18:15:30
drmeister
I could stop it at say 10 and then what? Collapse the equivalent subgraphs into single vertices and rerun the algorithm?
18:18:28
drmeister
The other thing I could do is just ditch the Mcgregor algorithm and use the element and position to identify common substructure between two molecules.
18:20:29
drmeister
I could define a function that takes two molecules and looks through the atoms of one of them for a similar atom in the other and if it finds it put it in a list of similars and if not a list of differences. Then return the two lists of similars and the two lists of differences.
19:14:33
drmeister
No - that won't work, there may be coincidental overlaps of atoms that aren't equivalent.