Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for duplicate ids appearing in connections when events contain uppercase and lowercase source_ids and target_ids #1233

Merged
merged 3 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions app/models/doi.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1273,30 +1273,30 @@ def downloads_over_time
end

def reference_ids
reference_events.pluck(:target_doi).compact.uniq.map(&:downcase)
reference_events.pluck(:target_doi).compact.map(&:downcase).uniq
end

def reference_count
reference_events.pluck(:target_doi).uniq.length
reference_events.pluck(:target_doi).compact.map(&:downcase).uniq.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codycooperross the only suggestion I would make is not exactly to do with your change but with the existing code repetition. This happens in a few places but if you look at the method reference_count it's code is exactly the same as reference_ids except for the additional length method invocation. Because we didn't reuse the code we had to make this change in two places.

This is certainly outside the scope of your PR. I will create an issue to clean up this code to make more DRY (don't repeat yourself).

I will approve this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Wendel!

end

def indexed_references
Doi.query(nil, ids: reference_ids, page: { number: 1, size: 25 }).results
end

def citation_ids
citation_events.pluck(:source_doi).compact.uniq.map(&:downcase)
citation_events.pluck(:source_doi).compact.map(&:downcase).uniq
end

# remove duplicate citing source dois
def citation_count
citation_events.pluck(:source_doi).uniq.length
citation_events.pluck(:source_doi).compact.map(&:downcase).uniq.length
end

# remove duplicate citing source dois,
# then show distribution by year
def citations_over_time
citation_events.pluck(:occurred_at, :source_doi).uniq { |v| v[1] }.
citation_events.pluck(:occurred_at, :source_doi).map { |v| [v[0], v[1].downcase] }.sort_by { |v| v[0] }.uniq { |v| v[1] }.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendelfabianchinsamy I just added back the pluck because I think the map would have selected the entire Event record and may not have been performant for DOIs with many events. All good?

group_by { |v| v[0].utc.iso8601[0..3] }.
map { |k, v| { "year" => k, "total" => v.length } }.
sort_by { |h| h["year"] }
Expand All @@ -1307,47 +1307,47 @@ def indexed_citations
end

def part_ids
part_events.pluck(:target_doi).compact.uniq.map(&:downcase)
part_events.pluck(:target_doi).compact.map(&:downcase).uniq
end

def part_count
part_events.pluck(:target_doi).uniq.length
part_events.pluck(:target_doi).compact.map(&:downcase).uniq.length
end

def indexed_parts
Doi.query(nil, ids: part_ids, page: { number: 1, size: 25 }).results.to_a
end

def part_of_ids
part_of_events.pluck(:source_doi).compact.uniq.map(&:downcase)
part_of_events.pluck(:source_doi).compact.map(&:downcase).uniq
end

def part_of_count
part_of_events.pluck(:source_doi).uniq.length
part_of_events.pluck(:source_doi).compact.map(&:downcase).uniq.length
end

def indexed_part_of
Doi.query(nil, ids: part_of_ids, page: { number: 1, size: 25 }).results
end

def version_ids
version_events.pluck(:target_doi).compact.uniq.map(&:downcase)
version_events.pluck(:target_doi).compact.map(&:downcase).uniq
end

def version_count
version_events.pluck(:target_doi).uniq.length
version_events.pluck(:target_doi).compact.map(&:downcase).uniq.length
end

def indexed_versions
Doi.query(nil, ids: version_ids, page: { number: 1, size: 25 }).results
end

def version_of_ids
version_of_events.pluck(:source_doi).compact.uniq.map(&:downcase)
version_of_events.pluck(:source_doi).compact.map(&:downcase).uniq
end

def version_of_count
version_of_events.pluck(:source_doi).uniq.length
version_of_events.pluck(:source_doi).compact.map(&:downcase).uniq.length
end

def indexed_version_of
Expand All @@ -1361,11 +1361,11 @@ def other_relation_events
def other_relation_ids
other_relation_events.map do |e|
e.doi
end.flatten.uniq - [doi.downcase]
end.flatten.compact.map(&:downcase).uniq - [doi.downcase]
end

def other_relation_count
other_relation_ids.length
other_relation_ids.compact.map(&:downcase).length
end

def xml_encoded
Expand Down
40 changes: 37 additions & 3 deletions spec/models/doi_related_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,30 @@
let(:client) { create(:client) }
let(:doi) { create(:doi, client: client, aasm_state: "findable") }
let(:target_doi) { create(:doi, client: client, aasm_state: "findable") }
let!(:reference_events) { create(:event_for_crossref, subj_id: "https://doi.org/#{doi.doi}", obj_id: "https://doi.org/#{target_doi.doi}", relation_type_id: "references") }
let!(:reference_event) do
create(:event_for_crossref, {
subj_id: "https://doi.org/#{doi.doi}",
obj_id: "https://doi.org/#{target_doi.doi}",
relation_type_id: "references",
occurred_at: "2015-06-13T16:14:19Z",
})
end
let!(:reference_event2) do
create(:event_for_crossref, {
subj_id: "https://doi.org/#{target_doi.doi}",
obj_id: "https://doi.org/#{doi.doi}",
occurred_at: "2016-06-13T16:14:19Z",
relation_type_id: "is-referenced-by",
})
end

it "has references" do
expect(doi.references.count).to eq(1)
# Some older events have downcased source_doi and target_doi
reference_event2.target_doi = target_doi.doi.downcase
reference_event2.source_doi = doi.doi.downcase
reference_event2.save

expect(doi.references.count).to eq(2)
expect(doi.reference_ids.count).to eq(1)
expect(doi.reference_count).to eq(1)

Expand Down Expand Up @@ -124,10 +144,24 @@
occurred_at: "2016-06-13T16:14:19Z"
})
end
let!(:citation_event4) do
create(:event_for_datacite_crossref, {
subj_id: "https://doi.org/#{source_doi2.doi}",
obj_id: "https://doi.org/#{doi.doi}",
relation_type_id: "cites",
source_id: "crossref",
occurred_at: "2017-06-13T16:14:19Z"
})
end

# removing duplicate dois in citation_ids, citation_count and citations_over_time (different relation_type_id)
it "has citations" do
expect(doi.citations.count).to eq(3)
# Some older events have downcased source_doi and target_doi
citation_event4.source_doi = source_doi2.doi.downcase
citation_event4.target_doi = doi.doi.downcase
citation_event4.save

expect(doi.citations.count).to eq(4)
expect(doi.citation_ids.count).to eq(2)
expect(doi.citation_count).to eq(2)
expect(doi.citations_over_time).to eq([{ "total" => 1, "year" => "2015" }, { "total" => 1, "year" => "2016" }])
Expand Down
Loading