freenode/#clasp - IRC Chatlog
Search
6:37:04
beach
Does GSOC still require some pre-approved organization to provide the advisor for the project?
6:37:39
beach
I remember trying to make my university such an organization in the past, and I failed.
13:11:10
scymtym
beach: some time ago, i mentioned my plan to use CST:RECONSTRUCT in eclector. i have this: https://github.com/robert-strandh/Eclector/compare/wip-cst-reconstruct?expand=1 and i think it works well. as you can see, it calls CST:RECONSTRUCT with multiple CST instances, which is not currently supported in the concrete-syntax-tree library. i have a change which adds support for that, but it conflicts with changes drmeister
13:55:21
beach
I say, check with drmeister that he is OK with that (he usually is), and let him know how to modify his code to account for the change.
14:00:21
scymtym
beach: you probably already did for the "Add client parameter to generic func reconstruct" change. i will try to make further changes compatible to that
14:35:25
beach
Don't try too hard to be backward compatible. There are very few clients at the moment, so this a good time to clean things up.
14:59:29
drmeister
I was messing around with the new jupyter lab yesterday to get some feeling for what it's about.
15:09:44
scymtym
drmeister: the commit message is "Add client parameter to generic func reconstruct". no worries - i will figure something out
15:14:49
scymtym
drmeister: unrelatedly, i think i have pretty good understanding of SMILES and SMARTS now. the most important thing i couldn't completely figure out, neither from your parser nor from the specifications, is how complex descriptions of atoms are supposed to interact with the [] construct
15:15:51
drmeister
Could you give me an example? My understanding is the [...] construct applies a bunch of tests to an atom.
15:15:53
scymtym
i.e. can i just pile up a bunch of modifiers like CH4-2@@ or this only allowed within [] as in [CH4-2@@]?
15:16:31
drmeister
My understanding is only the simplest element tests can be expressed outside of a [...]
15:17:46
scymtym
but within [], multiple complex description can be connected by logical operators without individual []s, like [C++;C@@], right?
15:18:37
scymtym
and the order of the things i called "modifiers" such as "-4" or "H3" or @ doesn't matter?
15:20:30
drmeister
No, I don't think so. The [] match test returns true or false, the order of tests within it shouldn't matter and the tests shouldn't fail.
15:22:55
drmeister
Those tests mean "the atom has a -4 charge"; H3 "The atom has 3 hydrogens attached"; @ "The atom has S stereochemistry".
15:24:00
scymtym
yes, i understand the semantics, i was trying to figure out the exact syntax so the parser can follow the specification
15:24:31
scymtym
precision doesn't seem to be the strong suite of the specification material i have read so far, though
15:26:04
scymtym
ok, i should have said "precision in the formal languages department" since i can't really judge the chemistry part, of course
15:27:25
drmeister
Yeah - you have to read several documents on it to piece it together. There is the reference standard in the OpenEye software - but I don't want to use that.
15:27:42
scymtym
and finally, can i assume that things like APLambda that don't seem to appear in the specifications are cando extensions?
15:28:28
drmeister
Yeah - that's mine - I wanted to inject arbitrary Common Lisp code to do tests - but that is really old. You could drop that out.
15:28:49
drmeister
Back when I implemented it it was an interpreted archaic lisp that I was putting in there.
15:29:57
drmeister
There is another important difference between the OpenEye SMARTS spec and what I did - I mentioned it before - it's how we label atoms to recover them afterwards.
15:31:25
drmeister
It should be a minor modification of my parser - but I want to get rid of my parser.
15:33:35
drmeister
No, that is incorrect. The :N operator is to label the current atom with the ID N.
15:34:42
drmeister
Simply dropping the number into the square brackets indicates a test for atomic mass
15:38:45
drmeister
If you applied that test to the red "C" atom it would return true and a match object that you could use to recover atoms with ID 1, 2 and 3 (the C, the O and the N respectively)
15:41:31
scymtym
Daylight Theory Manual, page 17 calls the construct "class". that's probably what mislead me
15:45:25
drmeister
"class" is a poor choice of a name IMHO. It may be more a more appropriate term when you get into the SMIRKS language superset.
15:47:05
drmeister
In the table at the bottom of page 23 it makes a bit more sense to me that they call them "class"es. They are distinguishing atoms in terms of where they start and where they end up in a reaction.
15:47:48
drmeister
Fundamentally we have our crappy human languages and we are trying to apply them to these fundamental, universal things.
15:48:46
scymtym
i'm still confused about Daylight Theory Manual, page 17 which mentions the class construct in the context of SMILES. isn't it only applicable to SMARTS and "above"?
15:49:03
drmeister
If there were a way to capture a group of atoms you would still need to apply a test to get the individual atoms within in.
15:49:52
drmeister
This confused me as well. I didn't even know about the class construct when I wrote my parser several years ago.
15:51:10
drmeister
I think that the class construct was introduced to support SMIRKS (where it is very important) and sort of back ported into SMARTS.
15:53:07
drmeister
SMARTS lets you recognize subgraphs within molecules. It becomes much more useful when you can capture atoms that are recognized - if you have a language that can operate on those atoms.
15:54:03
drmeister
It would allow you to build a molecule with a SMILES string and then get the new atoms that are labeled with the class operator.
15:54:23
scymtym
ok, maybe my mental model of the sub/superset relations of the languages and their respective uses is still wrong
15:55:03
drmeister
Rather you could build a residue (part of a molecule) with a SMILES string and then recover the atoms that were labeled with the class operator and then connect it to another residue constructed from another SMILES string.
15:55:34
drmeister
No - I don't think your model is wrong. There is a hierarchy. You just pointed out an extension that would be very useful.
15:56:27
scymtym
i thought SMILES was only for constructing, not matching. but i get drmeister's idea of building something and being able to pull certain parts out in a single operation
15:57:27
drmeister
[C:1][C:2](c1cccccc1) could be used to build polystyrene polymers. I'm certain that no existing SMILES parser will recognize that string.
15:58:47
drmeister
Cando would build 100 of those and then connect the [C:2] atom of each to the [C:1] atom of the next.
15:58:52
scymtym
drmeister: at least the Daylight Theory Manual explicitly specifies it for SMILES, on page 17. that's what i was trying to say the whole time
15:59:59
scymtym
hm, maybe they consider it an "extension for reactions". man, the boundaries are really blurry with these things
16:00:40
drmeister
Ok - if that was the manual 5-10 years ago when I wrote my parser then I probably ignored it. I didn't recognize it for what it was - a way to label atoms.
16:01:27
drmeister
If it was available then the examples on page 18 probably confused me - because they apply the labels to multiple atoms in a match.
16:01:36
scymtym
drmeister: in any case, thanks for feedback. i think i can get a bit further with that
16:02:38
drmeister
Yeah - I probably looked at this: [CH3:1][C:2](=[O:3])[O-:3].[Na+:4] and said WTF? That is NOT what I need.
16:03:20
drmeister
But if we constrain ourselves to use a 1:1 map we can use [CH3:1][C:2](=[O:3])[O-:4].[Na+:5]
16:05:50
drmeister
Imagine doing this in Python. The matching code has to be written in C++ to be fast. It would have to implement a map<int,list> to map integer labels to lists of atoms.
16:08:00
drmeister
Now I still have the matching code written in C++ but I can use a hash-table directly and access the hash-table directly from C++ and Common Lisp and memory management is taken care of.
16:11:25
beach
I think Eclector must have an implementation of PEEK-CHAR, because for some combination of arguments, it checks whether the character is a whitespace character.
16:12:09
beach
So the native PEEK-CHAR can not be used, because it would not consult the Eclector readtable for character syntax.
16:13:15
beach
Also, currently, Eclector doesn't use PEEK-CHAR simply because I completely forgot about its existence. But some parts of the reader could be faster when PEEK-CHAR is used.
16:14:23
scymtym
beach: sounds right. could you make an issue for this? ideally including a code snippet demonstrating the problem
16:15:09
beach
I can do that. But not today. I am working on our incremental parsing paper. That's how this came up.
16:16:38
scymtym
in the github comments section, i mean. since drmeister said they ran into this already
16:21:04
frgo
drmeister: I looked at pybind11. This is a clean thing - even I understand what they do.
16:21:58
frgo
I would love to do a "clbind2" based on pybind11. I only don't know enough about how to construct C++ objects for clasp,
16:22:39
drmeister
I started working on what is now clbind after C++11 and with an understanding of variadic templates.
16:23:53
drmeister
I didn't want to reimplement everything from scratch so when I discovered luabind I took that and hacked it with variadic templates to get it working for clbind.
16:23:59
frgo
Yeah - I know. C++11 does bring a lot of improvements. As does C++17, btw. - But that doesnÄt go well with LLVM 5.
16:25:08
frgo
It has the same API but has been completely written anew based on C++11 and only STL - noot using any BOOST stuff.
16:26:03
frgo
Yes!! Docs! Examples! I learned a lot of the pybind11 examples. I volunteer to write the prose around the core stuff.
16:27:25
drmeister
I went as far as to get permission from the luabind authors if I could use their documentation as a starting point for our clbind documentation.
17:18:13
scymtym
drmeister: i'm done for today. this is what i have so far: https://github.com/scymtym/language.smarts/tree/future