-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for duplicate ids appearing in connections when events contain uppercase and lowercase source_ids and target_ids #1233
Conversation
…percase and lowercase source_ids and target_ids
end | ||
|
||
def reference_count | ||
reference_events.pluck(:target_doi).uniq.length | ||
reference_events.pluck(:target_doi).compact.map(&:downcase).uniq.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codycooperross the only suggestion I would make is not exactly to do with your change but with the existing code repetition. This happens in a few places but if you look at the method reference_count it's code is exactly the same as reference_ids except for the additional length method invocation. Because we didn't reuse the code we had to make this change in two places.
This is certainly outside the scope of your PR. I will create an issue to clean up this code to make more DRY (don't repeat yourself).
I will approve this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Wendel!
end | ||
|
||
# remove duplicate citing source dois, | ||
# then show distribution by year | ||
def citations_over_time | ||
citation_events.pluck(:occurred_at, :source_doi).uniq { |v| v[1] }. | ||
citation_events.pluck(:occurred_at, :source_doi).map { |v| [v[0], v[1].downcase] }.sort_by { |v| v[0] }.uniq { |v| v[1] }. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wendelfabianchinsamy I just added back the pluck
because I think the map
would have selected the entire Event record and may not have been performant for DOIs with many events. All good?
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Purpose
Addresses issue where uppercase and lowercase ids in an Event's
source_id
ortarget_id
would duplicate connections in a Doi record, like citations and thuscitation_ids
,citation_count
. See thecitations
attribute here: https://api.datacite.org/dois/10.17632/579pxjyjz8.1closes: datacite/datacite#2195
Approach
Adds
.compact.map(&:downcase)
before.uniq
when calculating connections. Slightly refactorscitations_over_time
method.Open Questions and Pre-Merge TODOs
Would require a re-index to be reflected on every record in the REST API.
Learning
Types of changes
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Reviewer, please remember our guidelines: