Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser for TUCAN strings #74

Merged
merged 23 commits into from
Sep 23, 2022

Conversation

flange-ipb
Copy link
Collaborator

closes #73

- black: exclude generated parser files
- continue workflow even when black fails
@schatzsc schatzsc marked this pull request as ready for review September 6, 2022 14:31
@JanCBrammer
Copy link
Collaborator

I'll have a look this week, probably getting around to it on Friday.

- The lexer rules TWO_TO_NINE and ONE_TO_NINE would cause conflicts, so count and node_index now use the digits as literals.
- improve rules for digits >= 1 and >1 again
- node properties 'MASS' and 'RAD' need to be lowercase to match the result of the serializer
- roundtrip tests
tucan/parser/parser.py Outdated Show resolved Hide resolved
],
)
def test_parse_tucan_error_msg(tucan, expected_error_msg):
with pytest.raises(TucanParserException, match=expected_error_msg):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change that in test_parser.py too ...

tucan/parser/parser.py Outdated Show resolved Hide resolved
tucan/parser/parser.py Outdated Show resolved Hide resolved
tucan/parser/parser.py Show resolved Hide resolved
@JanCBrammer
Copy link
Collaborator

Finished with my first pass of the review. I haven't looked at the grammar definitions yet. Will be able to do so next week hopefully.

- use list comprehension
@JanCBrammer
Copy link
Collaborator

Regarding the three grammar files, how do we make sure that when one is updated, the updates are propagated to the other two? I.e., it's conceivable that the three files become inconsistent because we forget to keep them in sync. Is there a way to use one grammar to automatically generate the other two?

tucan/parser/tucan.g4 Outdated Show resolved Hide resolved
@JanCBrammer
Copy link
Collaborator

@flange-ipb, the grammar looks good to me (as far as I'm able to evaluate it without being familiar with the nitty-gritty).

What about the failing round-tripping tests? Should we ignore them for now? If so, I'd approve and merge this PR.

@flange-ipb
Copy link
Collaborator Author

Regarding the three grammar files, how do we make sure that when one is updated, the updates are propagated to the other two? I.e., it's conceivable that the three files become inconsistent because we forget to keep them in sync. Is there a way to use one grammar to automatically generate the other two?

Indeed, that has to be done manually. I don't see any good option for automation.
What I just found is a conversion to the W3C grammar notation, but it doesn't seem to be an open source software.

- roundtrip test: ignore failing tests
@flange-ipb
Copy link
Collaborator Author

I think this is ready now.

A few notes on the ignored graphs in the round-trip test:
water-d1_1 and water-d1_3 fail because we do not take the isotope mass into account in the canonicalization process.

n16_a123_in_P2_1_2_1_2_1, n16_a77sad1_in_P2_1, q17_a37sadm_in_P1_New_P21 and qv043_in_P2_1_New_Pca21 are unconnected dimers.
Interestingly, the serializer doesn't return the whole graph for those. E.g. n16_a123_in_P2_1_2_1_2_1 has 96 nodes and 98 edges, but the TUCAN string returned by serialize_molecule(canonicalize_molecule(m)) has only 54 nodes and 55 edges.

@JanCBrammer
Copy link
Collaborator

unconnected dimers.

@flange-ipb, see #47.

@JanCBrammer JanCBrammer merged commit 9938419 into TUCAN-nest:bliss-canonicalization Sep 23, 2022
@flange-ipb flange-ipb deleted the 73_parser branch October 27, 2023 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parser for TUCAN strings using a formal grammar
2 participants