Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify parsing SSSOM w/ curies.chain #431

Merged
merged 23 commits into from
Oct 2, 2023
Merged

Simplify parsing SSSOM w/ curies.chain #431

merged 23 commits into from
Oct 2, 2023

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Sep 26, 2023

Closes #363 (final nail in the coffin)

This provides an alternative to #429 that makes more explicit the chaining operations done on the metadata and prefix maps. This is also a good change to carefully document the way that this is handled, since I might not have captured it accurately. As it is, The priority order for combining prefix maps are:

  1. Internal prefix map inside the document
  2. Prefix map passed through this function inside the meta
  3. Prefix map passed through this function to prefix_map
  4. Default prefix map (handled with ensure_converter)

This provides an alternative to #429 that makes more explicit the chaining operations done on the metadata and prefix maps

This is also a good change to carefully document the way that this is handled, since I might not have captured it accurately
@cthoyt cthoyt requested review from hrshdhgd and matentzn September 26, 2023 11:49
src/sssom/parsers.py Outdated Show resolved Hide resolved
hrshdhgd
hrshdhgd previously approved these changes Sep 26, 2023
Copy link
Contributor

@hrshdhgd hrshdhgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean!

@hrshdhgd
Copy link
Contributor

But tests failing.

@cthoyt
Copy link
Member Author

cthoyt commented Sep 26, 2023

@hrshdhgd before I fix the tests, I want to make sure that everyone agrees on the domain logic (i.e., order of priority for metadata and prefix maps)

@matentzn
Copy link
Collaborator

Will check tomorrow don't merge

@cthoyt
Copy link
Member Author

cthoyt commented Sep 26, 2023

Will check tomorrow don't merge

We're not at merge yet, but if you can give feedback on the business logic for merging that would be great

tests/test_validate.py Outdated Show resolved Hide resolved
@cthoyt
Copy link
Member Author

cthoyt commented Sep 27, 2023

@matentzn all of the unit test issues are now passing (locally), so this is ready for review (though testing is broken because of poetry, yet again)

tests/test_collapse.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, made a few questions and comments. My primary concern right now is roundtripping. For example, what happens right now if someone loads a SSSOM mapping file to, say, annotate it with a new piece of metadata, not having included the SSSOM_BUILT_IN_PREFIXES? Will they just be added to the file that is written out?

TLDR: the remain thing I want to feel convinced about is that if a legal SSSOM file is passed to parse_sssom_table, and then written back out, it remains entirely unchanged. Is that, as far as you can see, the case?

src/sssom/parsers.py Outdated Show resolved Hide resolved
src/sssom/parsers.py Show resolved Hide resolved
src/sssom/writers.py Show resolved Hide resolved
src/sssom/parsers.py Show resolved Hide resolved
src/sssom/parsers.py Show resolved Hide resolved
tests/test_conversion.py Show resolved Hide resolved
@cthoyt
Copy link
Member Author

cthoyt commented Sep 30, 2023

This looks great, made a few questions and comments. My primary concern right now is roundtripping. For example, what happens right now if someone loads a SSSOM mapping file to, say, annotate it with a new piece of metadata, not having included the SSSOM_BUILT_IN_PREFIXES? Will they just be added to the file that is written out?

@matentzn I added an explicit test to show that outputting a coherent prefix map is working in b73ee19.

If someone adds more data to their MSDF manually, they can clean the prefix map automatically with the msdf.clean_prefix_map() functionality. If it's desired to add auto-cleaning before output, we can also do that, but it's not really what this PR is about

I am pretty confident that this makes round-tripping even safer than before, since it always tries to backfill the prefix map with the built-in one when loading, so even slightly broken source data files will get fixed up.

Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome changes man, a few more comments and I think we are ready to merge!

tests/test_parsers.py Outdated Show resolved Hide resolved
src/sssom/context.py Outdated Show resolved Hide resolved
src/sssom/context.py Outdated Show resolved Hide resolved
src/sssom/context.py Show resolved Hide resolved
src/sssom/parsers.py Outdated Show resolved Hide resolved
matentzn
matentzn previously approved these changes Oct 2, 2023
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a monumental effort. THANK YOU!

Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrshdhgd feel free to merge and make a new release as well!

@cthoyt cthoyt merged commit 41bcdca into master Oct 2, 2023
@cthoyt cthoyt deleted the simplify-parse-sssom branch October 2, 2023 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace CURIE inference mechanism with curies.Converter.from_extended_prefix_map
3 participants