Skip to content

Commit

Permalink
Replace hard-coded attribute keys (#110)
Browse files Browse the repository at this point in the history
* replace hard-coded node attribute "atom_number"

* replace hard-coded node attribute "element_symbol"

* replace hard-coded node attributes of the coordinates

* replace hard-coded node attribute "partition"

* forgot one replacement of "element_symbol"

* replace hard-coded node attribute "chg"

* replace hard-coded node attribute "rad"
intoduce a mapping in the (de-)serialization of TUCAN strings

* add test for V2000 Molfile charge mechanics (implemented via MOLFILE_V2000_CHARGES)

* replace hard-coded node attribute "mass"

* replace hard-coded node attribute "explored"

* replace hard-coded node attribute "invariant_code"

* replace hard-coded edge attribute "bond_type"

* cleanup, add/correct type hints
  • Loading branch information
flange-ipb authored Jan 4, 2024
1 parent d954dc6 commit 432889e
Show file tree
Hide file tree
Showing 16 changed files with 561 additions and 363 deletions.
6 changes: 4 additions & 2 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import networkx as nx
from pathlib import Path
from networkx.algorithms.components import is_connected

from tucan.graph_attributes import ATOMIC_NUMBER, ELEMENT_SYMBOL
from tucan.io import graph_from_file


Expand All @@ -25,8 +27,8 @@ def graph_from_dimacs(filepath):

graph = nx.Graph()
graph.add_nodes_from(node_labels)
nx.set_node_attributes(graph, "C", "element_symbol")
nx.set_node_attributes(graph, 6, "atomic_number")
nx.set_node_attributes(graph, "C", ELEMENT_SYMBOL)
nx.set_node_attributes(graph, 6, ATOMIC_NUMBER)
graph.add_edges_from(bonds)

return graph
Expand Down
82 changes: 76 additions & 6 deletions tests/io/test_molfile_v2000_reader.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,22 @@
import pytest
from pathlib import Path

from tucan.graph_attributes import (
ATOMIC_NUMBER,
CHG,
ELEMENT_SYMBOL,
MASS,
PARTITION,
RAD,
X_COORD,
Y_COORD,
Z_COORD,
)
from tucan.io import graph_from_file
from tucan.io.exception import MolfileParserException
from tucan.io.molfile_v2000_reader import (
_merge_tuples_into_additional_attributes,
_parse_atom_line,
_parse_atom_value_assignments,
_to_int,
_to_float,
Expand All @@ -27,6 +40,63 @@ def test_graphs_from_v2000_and_v3000_molfiles_match(mol):
assert e1 == e2


@pytest.mark.parametrize(
"line, expected_additional_attr",
[
(
" 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0",
{},
),
(
" 0.0000 0.0000 0.0000 C 0 1 0 0 0 0 0 0 0 0 0 0",
{CHG: 3},
),
(
" 0.0000 0.0000 0.0000 C 0 2 0 0 0 0 0 0 0 0 0 0",
{CHG: 2},
),
(
" 0.0000 0.0000 0.0000 C 0 3 0 0 0 0 0 0 0 0 0 0",
{CHG: 1},
),
(
" 0.0000 0.0000 0.0000 C 0 4 0 0 0 0 0 0 0 0 0 0",
{RAD: 2},
),
(
" 0.0000 0.0000 0.0000 C 0 5 0 0 0 0 0 0 0 0 0 0",
{CHG: -1},
),
(
" 0.0000 0.0000 0.0000 C 0 6 0 0 0 0 0 0 0 0 0 0",
{CHG: -2},
),
(
" 0.0000 0.0000 0.0000 C 0 7 0 0 0 0 0 0 0 0 0 0",
{CHG: -3},
),
(
" 0.0000 0.0000 0.0000 C 0 8 0 0 0 0 0 0 0 0 0 0",
{}, # ignored
),
],
)
def test_parse_atom_line_charge_field(line, expected_additional_attr):
expected_attrs = {
ELEMENT_SYMBOL: "C",
ATOMIC_NUMBER: 6,
PARTITION: 0,
X_COORD: 0,
Y_COORD: 0,
Z_COORD: 0,
}
expected_attrs.update(expected_additional_attr)

atom_attrs = _parse_atom_line(line)

assert atom_attrs == expected_attrs


@pytest.mark.parametrize(
"tuples, additional_attrs, expected_additional_attrs_after_merge",
[
Expand All @@ -37,21 +107,21 @@ def test_graphs_from_v2000_and_v3000_molfiles_match(mol):
(2, 3), # atom index is not in additional_attrs yet
],
{
0: {"chg": 2}, # will add new key
1: {"mass": 1}, # will overwrite value
0: {CHG: 2}, # will add new key
1: {MASS: 1}, # will overwrite value
},
{
0: {"chg": 2, "mass": 2},
1: {"mass": 13},
2: {"mass": 3},
0: {CHG: 2, MASS: 2},
1: {MASS: 13},
2: {MASS: 3},
},
),
],
)
def test_merge_tuples_into_additional_attributes(
tuples, additional_attrs, expected_additional_attrs_after_merge
):
_merge_tuples_into_additional_attributes(tuples, "mass", additional_attrs)
_merge_tuples_into_additional_attributes(tuples, MASS, additional_attrs)
assert additional_attrs == expected_additional_attrs_after_merge


Expand Down
Loading

6 comments on commit 432889e

@schatzsc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flange-ipb Nice (-:= That will make further "custom node attributes" more easy to handle - further modifications planned?

@schatzsc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flange-ipb This change is not in the main package yet, isn't it? I tried to update a local installation but it did not modify any files

@flange-ipb
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @schatzsc
Happy new year! Just a bit of code cleanup to avoid any confusion about the use of attribute keys. And it makes it easier to keep track where an attribute is used. :)
The pull request went to bliss-canonicalization, which is the default branch.

@schatzsc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know where the TUCAN files get stores on a Windows 10 or Linux system?!? In the place where I'd expect them there are only the only ones without those modifications.

Furthermore, with the new installation also mention in relation to the other issue, the probem disappeared, at least when I try to run the functions instead a script

@flange-ipb
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on your package management tool.
I usually use pip in a virtual environment (created via python -m venv .env in the TUCAN project directory). Then all the installed dependencies are stored in .env/lib/python3.11/site-packages/. My IDE (PyCharm) also integrates that nicely.

In principle it is also possible to work without a virtual environment and install dependencies globally, i.e. for a user. This goes into $HOME/.local/lib/python3.9/site-packages/. In Ubuntu/Debian pip refuses to support this mode since Python 3.11, but there are workarounds.

@schatzsc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flange-ipb
In this particular case, It's more complicated, since the CSD Python API comes with its own separate python installation

But thanks to your hint I found the files, which on my Ubuntu Linux machine reside here

~/CCDC/ccdc-software/csd-python-api/miniconda/lib/python3.9/site-packages/tucan

and now also include the latest code changes (-:=

I like the new graph_attributes.py definitions since they will facilitate addition of custom node attributes for ML in a simple way, which only need to be carried along, but should not be evaluated in the canonicalization

BTW - at the moment I'm working again on the "implict H preprocessor" and try to store the rules in a sqlite3 database, as you suggested, but first need to familiarize more with that package

Please sign in to comment.