Parser for TUCAN strings #74

flange-ipb · 2022-09-06T14:08:23Z

closes #73

… testing the grammar

- black: exclude generated parser files - continue workflow even when black fails

JanCBrammer · 2022-09-06T15:34:56Z

I'll have a look this week, probably getting around to it on Friday.

- The lexer rules TWO_TO_NINE and ONE_TO_NINE would cause conflicts, so count and node_index now use the digits as literals.

- improve rules for digits >= 1 and >1 again

- node properties 'MASS' and 'RAD' need to be lowercase to match the result of the serializer - roundtrip tests

tucan/parser/parser.py

…ception - test exception message - use f-strings

- add grammar in W3C EBNF and BNF Playground formats

…one level less of indirection

- remove debug print

flange-ipb · 2022-09-09T12:26:06Z

tests/parser/test_exception.py

+    ],
+)
+def test_parse_tucan_error_msg(tucan, expected_error_msg):
+    with pytest.raises(TucanParserException, match=expected_error_msg):


I'll change that in test_parser.py too ...

tests/parser/test_grammar.py

tests/parser/test_parser.py

tucan/parser/parser.py

JanCBrammer · 2022-09-09T15:10:34Z

Finished with my first pass of the review. I haven't looked at the grammar definitions yet. Will be able to do so next week hopefully.

- use list comprehension

JanCBrammer · 2022-09-16T13:54:57Z

Regarding the three grammar files, how do we make sure that when one is updated, the updates are propagated to the other two? I.e., it's conceivable that the three files become inconsistent because we forget to keep them in sync. Is there a way to use one grammar to automatically generate the other two?

tucan/parser/tucan.g4

JanCBrammer · 2022-09-16T14:30:17Z

@flange-ipb, the grammar looks good to me (as far as I'm able to evaluate it without being familiar with the nitty-gritty).

What about the failing round-tripping tests? Should we ignore them for now? If so, I'd approve and merge this PR.

…ater_than_one'

flange-ipb · 2022-09-19T08:22:19Z

Regarding the three grammar files, how do we make sure that when one is updated, the updates are propagated to the other two? I.e., it's conceivable that the three files become inconsistent because we forget to keep them in sync. Is there a way to use one grammar to automatically generate the other two?

Indeed, that has to be done manually. I don't see any good option for automation.
What I just found is a conversion to the W3C grammar notation, but it doesn't seem to be an open source software.

- roundtrip test: ignore failing tests

flange-ipb · 2022-09-21T08:48:54Z

I think this is ready now.

A few notes on the ignored graphs in the round-trip test:
water-d1_1 and water-d1_3 fail because we do not take the isotope mass into account in the canonicalization process.

n16_a123_in_P2_1_2_1_2_1, n16_a77sad1_in_P2_1, q17_a37sadm_in_P1_New_P21 and qv043_in_P2_1_New_Pca21 are unconnected dimers.
Interestingly, the serializer doesn't return the whole graph for those. E.g. n16_a123_in_P2_1_2_1_2_1 has 96 nodes and 98 edges, but the TUCAN string returned by serialize_molecule(canonicalize_molecule(m)) has only 54 nodes and 55 edges.

JanCBrammer · 2022-09-23T05:42:38Z

unconnected dimers.

@flange-ipb, see #47.

flange-ipb added 2 commits September 6, 2022 13:48

ANTLR4 grammar (only sum formular by now) and generated parser; start…

122ce8d

… testing the grammar

- correct code formatting

29b9bf2

- black: exclude generated parser files - continue workflow even when black fails

schatzsc marked this pull request as ready for review September 6, 2022 14:31

flange-ipb added 4 commits September 6, 2022 17:36

- add grammar rules and tests for the tuples

f28a767

- The lexer rules TWO_TO_NINE and ONE_TO_NINE would cause conflicts, so count and node_index now use the digits as literals.

more test samples

d8eb8b3

- add grammar rules and tests for the node attributes

ea76244

- improve rules for digits >= 1 and >1 again

- add grammar rule and tests for tucan

7d03819

- node properties 'MASS' and 'RAD' need to be lowercase to match the result of the serializer - roundtrip tests

JanCBrammer reviewed Sep 7, 2022

View reviewed changes

tucan/parser/parser.py Outdated Show resolved Hide resolved

flange-ipb and others added 12 commits September 7, 2022 17:12

- use a custom exception class instead of ANTLR's ParseCancellationEx…

fcd29cc

…ception - test exception message - use f-strings

remove "chg" from the serialization as node property

4e0073e

- rearrange ANTLR4 grammar file a bit

0f4b99a

- add grammar in W3C EBNF and BNF Playground formats

construct an atom list from the sum formula

5308a0e

construct bond list from tuples

2fdab36

grammar: absorb the rule "node_properties" into "node_attribute" for …

49745ce

…one level less of indirection

extract node properties

5ddbcbb

generate graph

2915154

try to fix roundtrip test (still 6 fails)

8788827

Added java to devcontainer.

ef8c4f0

- docstring

05f471b

- remove debug print

Using pytest.raises with match.

1997da8

flange-ipb commented Sep 9, 2022

View reviewed changes

use pytest.raises with match

2a7fbeb

JanCBrammer reviewed Sep 9, 2022

View reviewed changes

tests/parser/test_grammar.py Outdated Show resolved Hide resolved

JanCBrammer reviewed Sep 9, 2022

View reviewed changes

tests/parser/test_parser.py Outdated Show resolved Hide resolved

f-string and unused excinfo

f428cb1

JanCBrammer reviewed Sep 9, 2022

View reviewed changes

tucan/parser/parser.py Outdated Show resolved Hide resolved

tucan/parser/parser.py Outdated Show resolved Hide resolved

tucan/parser/parser.py Show resolved Hide resolved

- more explicit function parameter

dad3a19

- use list comprehension

JanCBrammer reviewed Sep 16, 2022

View reviewed changes

tucan/parser/tucan.g4 Outdated Show resolved Hide resolved

grammar: rename 'gte_one' to 'greater_than_zero' and 'gt_one' to 'gre…

ee21753

…ater_than_one'

- add pytest marker to ignore tests by their id

4a7fc40

- roundtrip test: ignore failing tests

JanCBrammer approved these changes Sep 23, 2022

View reviewed changes

JanCBrammer merged commit 9938419 into TUCAN-nest:bliss-canonicalization Sep 23, 2022

flange-ipb deleted the 73_parser branch October 27, 2023 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser for TUCAN strings #74

Parser for TUCAN strings #74

flange-ipb commented Sep 6, 2022

JanCBrammer commented Sep 6, 2022

flange-ipb Sep 9, 2022

JanCBrammer commented Sep 9, 2022

JanCBrammer commented Sep 16, 2022

JanCBrammer commented Sep 16, 2022

flange-ipb commented Sep 19, 2022

flange-ipb commented Sep 21, 2022

JanCBrammer commented Sep 23, 2022

Parser for TUCAN strings #74

Parser for TUCAN strings #74

Conversation

flange-ipb commented Sep 6, 2022

JanCBrammer commented Sep 6, 2022

flange-ipb Sep 9, 2022

Choose a reason for hiding this comment

JanCBrammer commented Sep 9, 2022

JanCBrammer commented Sep 16, 2022

JanCBrammer commented Sep 16, 2022

flange-ipb commented Sep 19, 2022

flange-ipb commented Sep 21, 2022

JanCBrammer commented Sep 23, 2022