Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with MIRIAM alignment #707

Merged
merged 7 commits into from
Jan 12, 2023
Merged

Fix issues with MIRIAM alignment #707

merged 7 commits into from
Jan 12, 2023

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Jan 12, 2023

Two issues came up related to recent changes:

  1. miriam:gramene.growthstage was renamed to miriam:gro. This now conflicts with gro which is reserved for the Gene Regulation Ontology, so a mismatch was curated and the metaregistry links in the bioregistry:gramene.growthstage record was updated
  2. miriam:tair.name was added, this is equivalent to bioregistry:araport. This is the first situation (I think) when Identifiers.org has added a prefix that was a duplicate of something novel in the Bioregistry but didn't have the same prefix. I'm not sure how often this will happen in the future so I'm not sure if this kind of problem needs addressing in the alignment code.

@cthoyt
Copy link
Member Author

cthoyt commented Jan 12, 2023

cc @renatocjn this will be interesting for you

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2023

Codecov Report

Base: 39.57% // Head: 39.57% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (2369e2b) compared to base (4d01add).
Patch coverage: 20.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #707      +/-   ##
==========================================
- Coverage   39.57%   39.57%   -0.01%     
==========================================
  Files         129      129              
  Lines        7355     7359       +4     
  Branches     1683     1684       +1     
==========================================
+ Hits         2911     2912       +1     
- Misses       4272     4275       +3     
  Partials      172      172              
Impacted Files Coverage Δ
src/bioregistry/align/utils.py 0.00% <0.00%> (ø)
src/bioregistry/external/miriam.py 32.78% <50.00%> (+0.58%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cthoyt cthoyt merged commit 5834b3b into main Jan 12, 2023
@cthoyt cthoyt deleted the fix-miriam branch January 12, 2023 11:06
@renatocjn
Copy link

renatocjn commented Jan 12, 2023

Well, I can provide some reasoning on the two changes.

First, the gramene.growthstage namespace was renamed in order to make the GRO prefix work. The namespace was non functional anyway. See identifiers-org/identifiers-org.github.io#184 and identifiers-org/identifiers-org.github.io#155. I guess you managed to fix things on your end? I might have to do the same for the oma.hog prefix.

Then, the tair.name prefix was registered by request of the DBCLS team. They argue that TAIR names are also used as IDs. From the homepage of araport prefix that you mention, it seems that it was discontinued and incorporated into TAIR and some other repositories. So saying that it is a duplicate is not exactly correct.

@cthoyt
Copy link
Member Author

cthoyt commented Jan 12, 2023

Well, I can provide some reasoning on the two changes.

First, the gramene.growthstage namespace was renamed in order to make the GRO prefix work. The namespace was non functional anyway. See identifiers-org/identifiers-org.github.io#184 and identifiers-org/identifiers-org.github.io#155. I guess you managed to fix things on your end? I might have to do the same for the oma.hog prefix.

That's a bummer, consistency of prefixes is important

Then, the tair.name prefix was registered by request of the DBCLS team. They argue that TAIR names are also used as IDs. From the homepage of araport prefix that you mention, it seems that it was discontinued and incorporated into TAIR and some other repositories. So saying that it is a duplicate is not exactly correct.

Regardless of what prefix assigned, these represent the same semantic space. Within the scope of the Bioregistry, these are equivalent. The Bioregistry models that multiple prefixes can correspond to the same thing (e.g., eccode and intenz are the same).

Further, there can be many potential providers of a semantic space. Unfortunately it's very confusing to follow the succession of TAIR's semantic spaces as various resources were shut down/consolidated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants