Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for duplicate ids appearing in connections when events contain uppercase and lowercase source_ids and target_ids #1233

Merged
merged 3 commits into from
Aug 30, 2024

Conversation

codycooperross
Copy link
Contributor

@codycooperross codycooperross commented Aug 29, 2024

Purpose

Addresses issue where uppercase and lowercase ids in an Event's source_id or target_id would duplicate connections in a Doi record, like citations and thus citation_ids, citation_count. See the citations attribute here: https://api.datacite.org/dois/10.17632/579pxjyjz8.1

"citations": {
"data": [
{
"id": "10.32725/jab.2020.004",
"type": "dois"
},
{
"id": "10.32725/jab.2020.004",
"type": "dois"
}
]
},

closes: datacite/datacite#2195

Approach

Adds .compact.map(&:downcase) before .uniq when calculating connections. Slightly refactors citations_over_time method.

Open Questions and Pre-Merge TODOs

Would require a re-index to be reflected on every record in the REST API.

Learning

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

…percase and lowercase source_ids and target_ids
@codycooperross codycooperross requested a review from a team August 29, 2024 21:15
end

def reference_count
reference_events.pluck(:target_doi).uniq.length
reference_events.pluck(:target_doi).compact.map(&:downcase).uniq.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codycooperross the only suggestion I would make is not exactly to do with your change but with the existing code repetition. This happens in a few places but if you look at the method reference_count it's code is exactly the same as reference_ids except for the additional length method invocation. Because we didn't reuse the code we had to make this change in two places.

This is certainly outside the scope of your PR. I will create an issue to clean up this code to make more DRY (don't repeat yourself).

I will approve this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Wendel!

end

# remove duplicate citing source dois,
# then show distribution by year
def citations_over_time
citation_events.pluck(:occurred_at, :source_doi).uniq { |v| v[1] }.
citation_events.pluck(:occurred_at, :source_doi).map { |v| [v[0], v[1].downcase] }.sort_by { |v| v[0] }.uniq { |v| v[1] }.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendelfabianchinsamy I just added back the pluck because I think the map would have selected the entire Event record and may not have been performant for DOIs with many events. All good?

@codycooperross codycooperross merged commit c4f171a into master Aug 30, 2024
13 checks passed
@codycooperross codycooperross deleted the connection_count_issue branch August 30, 2024 12:31
Copy link

sentry-io bot commented Sep 3, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ **NoMethodError: undefined method downcase' for nil:NilClass** app/models/doi.rb in block in citations_over_time` View Issue
  • ‼️ **NoMethodError: undefined method downcase' for nil:NilClass** app/models/doi.rb in block in citations_over_time` View Issue
  • ‼️ **NoMethodError: undefined method downcase' for nil:NilClass** DataciteDoisController#show` View Issue
  • ‼️ **NoMethodError: undefined method downcase' for nil:NilClass** DataciteDoisController#update` View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Citations/references/other DOI connections may be duplicated in REST API within a single DOI record
2 participants