Skip to content

Commit 37be0ac

Browse files
committed
Merge pull request #218 from GreenGroup/new-style-adjacency-list
New style adjacency list
2 parents 4c7def1 + 7ae437c commit 37be0ac

39 files changed

Lines changed: 1710 additions & 675 deletions

.travis.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ before_install:
99
# - sudo apt-get install python-rdkit librdkit-dev librdkit1 rdkit-data
1010
- sudo apt-get install -qq python-numpy python-scipy python-matplotlib
1111
- cd ..
12+
1213
- git clone https://github.com/GreenGroup/RMG-database.git
1314
- git clone https://github.com/GreenGroup/PyDAS.git
1415
- git clone https://github.com/GreenGroup/PyDQED.git

documentation/source/reference/molecule/adjlist.rst

Lines changed: 80 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,15 @@ Adjacency Lists
77

88
.. module:: rmgpy.molecule.adjlist
99

10+
11+
.. note::
12+
The adjacency list syntax changed in July 2014.
13+
The minimal requirement for most translations is to prefix the number
14+
of unpaired electrons with the letter `u`.
15+
The new syntax, however, allows much
16+
greater flexibility, including definition of lone pairs, partial charges,
17+
wildcards, and molecule multiplicities.
18+
1019
.. note::
1120
To quickly visualize any adjacency list, or to generate an adjacency list from
1221
other types of molecular representations such as SMILES, InChI, or even common
@@ -21,49 +30,100 @@ RMG -- but extended to allow for specification of extra semantic information.
2130
The first line of most adjacency lists is a unique identifier for the molecule
2231
or pattern the adjacency list represents. This is not strictly required, but
2332
is recommended in most cases. Generally the identifier should only use
24-
alphanumeric characters and the underscore, as if an identifer in many popular
33+
alphanumeric characters and the underscore, as if an identifier in many popular
2534
programming languages. However, strictly speaking any non-space ASCII character
2635
is allowed.
2736

28-
After the identifier line, each subsequent line describes a single atom and its
37+
The subsequent lines may contain keyword-value pairs. Currently there is only
38+
one keyword, ``multiplicity``.
39+
40+
For species or molecule declarations, the value after ``multiplicity`` defines
41+
the spin multiplicity of the molecule. E.g. ``multiplicity 1`` for most ground state
42+
closed shell species, ``multiplicity 2`` for most radical species,
43+
and ``multiplicity 3`` for a triplet biradical.
44+
If the ``multiplicity`` line is not present then a value of
45+
(1 + number of unpaired electrons) is assumed.
46+
Thus, it can usually be omitted, but if present can be used to distinguish,
47+
for example, singlet CH2 from triplet CH2.
48+
49+
If defining a Functional :class:`~rmgpy.molecule.Group`, then the value must be a list,
50+
which defines the multiplicities that will be matched by the group, eg.
51+
``multiplicity [1,2,3,4,5]`` or, for a single value, ``multiplicity [1]``.
52+
If the multiplicity line is omitted, then ``multiplicity [1,2,3,4,5]`` is assumed.
53+
54+
After the identifier line and keyword-value lines,
55+
each subsequent line describes a single atom and its
2956
local bond structure. The format of these lines is a whitespace-delimited list
3057
with tokens ::
3158

32-
<number> [<label>] <element> <radicals> <bondlist>
59+
<number> [<label>] <element> u<unpaired> [p<pairs>] [c<charge>] <bondlist>
3360

3461
The first item is the number used to identify that atom. Any number may be used,
3562
though it is recommended to number the atoms sequentially starting from one.
3663
Next is an optional label used to tag that atom; this should be an
37-
asterisk followed by a unique number for the label, e.g. ``*1``. After that is
38-
the atom's element, indicated by its atomic symbol, followed by the number of
39-
radical electrons on the atom. The last set of tokens is the list of bonds.
64+
asterisk followed by a unique number for the label, e.g. ``*1``.
65+
In some cases (e.g. thermodynamics groups) there is only one labeled atom, and the label
66+
is just an asterisk with no number: ``*``.
67+
68+
After that is
69+
the atom's element or atom type, indicated by its atomic symbol, followed by
70+
a sequence of tokens describing the electronic state of the atom:
71+
72+
* ``u0`` number of **unpaired** electrons (eg. radicals)
73+
* ``p0`` number of lone **pairs** of electrons, common on oxygen and nitrogen.
74+
* ``c0`` formal **charge** on the atom, e.g. ``c-1`` (negatively charged),
75+
``c0``, ``c+1`` (positively charged)
76+
77+
For :class:`~rmgpy.molecule.Molecule` definitions:
78+
The value must be a single integer (and for charge must have a + or - sign if not equal to 0)
79+
The number of unpaired electrons (i.e. radical electrons) is required, even if zero.
80+
The number of lone pairs and the formal charge are assumed to be zero if omitted.
81+
82+
For :class:`~rmgpy.molecule.Group` definitions:
83+
The value can be an integer or a list of integers (with signs, for charges),
84+
eg. ``u[0,1,2]`` or ``c[0,+1,+2,+3,+4]``, or may be a wildcard ``x``
85+
which matches any valid value,
86+
eg. ``px`` is the same as ``p[0,1,2,3,4]`` and ``cx`` is the same as
87+
``c[-4,-3,-2,-1,0,+1,+2,+3,+4]``. Lists must be enclosed is square brackets,
88+
and separated by commas, without spaces.
89+
If lone pairs or formal charges are omitted from a group definition,
90+
the wildcard is assumed.
91+
92+
93+
The last set of tokens is the list of bonds.
4094
To indicate a bond, place the number of the atom at the other end of the bond
4195
and the bond type within curly braces and separated by a comma, e.g. ``{2,S}``.
42-
Multiple bonds to the same atom should be separated by whitespace.
96+
Multiple bonds from the same atom should be separated by whitespace.
4397

4498
.. note::
4599
You must take care to make sure each bond is listed on the lines of *both*
46100
atoms in the bond, and that these entries have the same bond type. RMG will
47101
raise an exception if it encounters such an invalid adjacency list.
48102

103+
49104
When writing a molecular substructure pattern, you may specify multiple
50-
elements, radical counts, and bond types as a comma-separated list inside curly
51-
braces. For example, to specify any carbon or oxygen atom, use the syntax
52-
``{C,O}``. Atom types may also be used as a shorthand. (Atom types can also be
105+
elements, radical counts, and bond types as a comma-separated list inside square
106+
brackets. For example, to specify any carbon or oxygen atom, use the syntax
107+
``[C,O]``. For a single or double bond to atom 2, write ``{2,[S,D]}``.
108+
109+
Atom types such as ``R!H`` or ``Cdd`` may also be used as a shorthand. (Atom types
110+
like ``Cdd`` can also be
53111
used in full molecules, but this use is discouraged, as RMG can compute them
54112
automatically for full molecules.)
55113

56114
Below is an example adjacency list, for 1,3-hexadiene, with the weakest bond in
57115
the molecule labeled with ``*1`` and ``*2``. Note that hydrogen atoms
58-
can be omitted if desired, as their presence is inferred::
116+
can be omitted if desired, as their presence is inferred, provided that unpaired
117+
electrons, lone pairs, and charges are all correctly defined::
59118

60119
HXD13
61-
1 C 0 {2,D}
62-
2 C 0 {1,D} {3,S}
63-
3 C 0 {2,S} {4,D}
64-
4 C 0 {3,D} {5,S}
65-
5 *1 C 0 {4,S} {6,S}
66-
6 *2 C 0 {5,S}
120+
multiplicity 1
121+
1 C u0 {2,D}
122+
2 C u0 {1,D} {3,S}
123+
3 C u0 {2,S} {4,D}
124+
4 C u0 {3,D} {5,S}
125+
5 *1 C u0 {4,S} {6,S}
126+
6 *2 C u0 {5,S}
67127
68128
The allowed element types, radicals, and bonds are listed in the following table:
69129

@@ -77,10 +137,10 @@ The allowed element types, radicals, and bonds are listed in the following table
77137
| | H | Hydrogen atom |
78138
| +----------+---------------------+
79139
| | S | Sulfur atom |
80-
+----------------------+----------+---------------------+
81-
| Nonreactive Elements | N | Nitrogen atom |
82140
| +----------+---------------------+
83-
| | Si | Silicon atom |
141+
| | N | Nitrogen atom |
142+
+----------------------+----------+---------------------+
143+
| Nonreactive Elements | Si | Silicon atom |
84144
| +----------+---------------------+
85145
| | Cl | Chlorine atom |
86146
| +----------+---------------------+
@@ -89,18 +149,6 @@ The allowed element types, radicals, and bonds are listed in the following table
89149
| | Ar | Argon atom |
90150
| +----------+---------------------+
91151
+----------------------+----------+---------------------+
92-
| Free Electrons | 0 | Non-radical |
93-
| +----------+---------------------+
94-
| | 1 | Mono-radical |
95-
| +----------+---------------------+
96-
| | 2 | Bi-radical |
97-
| +----------+---------------------+
98-
| | 2T | Triplet |
99-
| +----------+---------------------+
100-
| | 2S | Singlet |
101-
| +----------+---------------------+
102-
| | 3 | Tri-radical |
103-
+----------------------+----------+---------------------+
104152
| Chemical Bond | S | Single Bond |
105153
| +----------+---------------------+
106154
| | D | Double Bond |

documentation/source/users/rmg/faq.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,37 @@
33
**************************
44
Frequently Asked Questions
55
**************************
6+
7+
8+
Why can't my adjacency lists be read any more?
9+
==============================================
10+
11+
The adjacency list syntax changed in July 2014.
12+
The minimal requirement for most translations is to prefix the number
13+
of unpaired electrons with the letter `u`.
14+
15+
Example old syntax::
16+
17+
HXD13
18+
1 C 0 {2,D}
19+
2 C 0 {1,D} {3,S}
20+
3 C 0 {2,S} {4,D}
21+
4 C 0 {3,D} {5,S}
22+
5 *1 C 0 {4,S} {6,S}
23+
6 *2 C 0 {5,S}
24+
25+
Example new syntax::
26+
27+
HXD13
28+
1 C u0 {2,D}
29+
2 C u0 {1,D} {3,S}
30+
3 C u0 {2,S} {4,D}
31+
4 C u0 {3,D} {5,S}
32+
5 *1 C u0 {4,S} {6,S}
33+
6 *2 C u0 {5,S}
34+
35+
The new syntax, however, allows much
36+
greater flexibility, including definition of lone pairs, partial charges,
37+
wildcards, and molecule multiplicities, and was necessary to allow us to
38+
add Nitrogen chemistry.
39+
See :ref:`rmgpy.molecule.adjlist` for details of the new syntax.

examples/rmg/1,3-hexadiene/input.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@
88
kineticsEstimator = 'rate rules',
99
)
1010

11+
# Constraints on generated species
12+
generatedSpeciesConstraints(
13+
maximumRadicalElectrons = 2,
14+
)
15+
1116
# List of species
1217
species(
1318
label='HXD13',
@@ -24,8 +29,8 @@
2429
reactive=True,
2530
structure=adjacencyList(
2631
"""
27-
1 H 0 {2,S}
28-
2 H 0 {1,S}
32+
1 H u0 p0 {2,S}
33+
2 H u0 p0 {1,S}
2934
"""),
3035
)
3136
species(

examples/rmg/c3h4/input.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@
88
kineticsEstimator = 'rate rules',
99
)
1010

11+
generatedSpeciesConstraints(
12+
maximumRadicalElectrons = 4,
13+
)
14+
1115
# List of species
1216
species(
1317
label='CH2',

examples/rmg/ch3no2/input.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@
2626
reactive=True,
2727
structure=adjacencyList(
2828
"""
29-
1 C 0 0 {2,S} {3,S} {4,S} {5,S}
30-
2 H 0 0 {1,S}
31-
3 H 0 0 {1,S}
32-
4 H 0 0 {1,S}
33-
5 N 0 0 {1,S} {6,D} {7,S}
34-
6 O 0 2 {5,D}
35-
7 O 0 3 {5,S}
29+
1 C u0 p0 {2,S} {3,S} {4,S} {5,S}
30+
2 H u0 p0 {1,S}
31+
3 H u0 p0 {1,S}
32+
4 H u0 p0 {1,S}
33+
5 N u0 p0 {1,S} {6,D} {7,S}
34+
6 O u0 p2 {5,D}
35+
7 O u0 p3 {5,S}
3636
"""),
3737
)
3838

@@ -41,8 +41,8 @@
4141
reactive=True,
4242
structure=adjacencyList(
4343
"""
44-
1 O 1 2 {2,S}
45-
2 O 1 2 {1,S}
44+
1 O u1 p2 {2,S}
45+
2 O u1 p2 {1,S}
4646
"""),
4747
)
4848

@@ -51,8 +51,8 @@
5151
reactive=True,
5252
structure=adjacencyList(
5353
"""
54-
1 N 1 1 {2,T}
55-
2 N 1 1 {1,T}
54+
1 N u0 p1 {2,T}
55+
2 N u0 p1 {1,T}
5656
"""),
5757
)
5858

examples/rmg/diesel/input.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@
4747
species(
4848
label='O2',
4949
reactive=True,
50-
structure=SMILES("O=O"),
50+
structure=SMILES("[O][O]"),
5151
)
5252

5353
# Reaction systems

examples/rmg/e85/input.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,16 @@
88
kineticsEstimator = 'rate rules',
99
)
1010

11+
# Constraints on generated species
12+
generatedSpeciesConstraints(
13+
maximumRadicalElectrons = 2,
14+
)
15+
1116
# List of species
1217
species(
1318
label='O2', # oxygen
1419
reactive=True,
15-
structure=SMILES("O=O"),
20+
structure=SMILES("[O][O]"),
1621
)
1722
species(
1823
label='C8H18i', # isooctane

examples/rmg/liquid_phase/input.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@
88
kineticsEstimator = 'rate rules',
99
)
1010

11+
# Constraints on generated species
12+
generatedSpeciesConstraints(
13+
maximumRadicalElectrons = 3,
14+
)
15+
1116
# List of species
1217
species(
1318
label='octane',

examples/rmg/methylformate/input.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@
2929
reactive=True,
3030
structure=adjacencyList(
3131
"""
32-
1 C 3 {2,S}
33-
2 H 0 {1,S}
32+
1 C u3 p0 {2,S}
33+
2 H u0 p0 {1,S}
3434
"""),
3535
)
3636
species(
@@ -46,7 +46,7 @@
4646
species(
4747
label='CO',
4848
reactive=True,
49-
structure=SMILES("[C]=O"),
49+
structure=SMILES("[C+]#[O-]"),
5050
)
5151
species(
5252
label='CO2',
@@ -157,4 +157,5 @@
157157
drawMolecules=False,
158158
generatePlots=False,
159159
saveConcentrationProfiles=False,
160+
saveEdgeSpecies=True,
160161
)

0 commit comments

Comments
 (0)