Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate lod_links Entries Due to Missing Polymorphic Constraints #375

Open
MartinHinz opened this issue Dec 19, 2024 · 0 comments
Open

Comments

@MartinHinz
Copy link
Collaborator

Problem Description

The current implementation of the lod_links table enforces a unique constraint on the combination of source and external_id:

t.index ["source", "external_id"], name: "index_lod_links_on_source_and_external_id", unique: true

This constraint causes issues when multiple Site records with the same name attempt to create a lod_link pointing to the same Wikidata entity. Since the unique index does not account for the linkable_id and linkable_type columns in the polymorphic association, the following problems arise:

  1. Unique Constraint Violation:
  • When two or more Site records try to link to the same source and external_id, a PG::UniqueViolation error occurs:
    ActiveRecord::RecordNotUnique (PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_lod_links_on_source_and_external_id")
  1. Data Integrity Issues:
  • The unique constraint does not align with the polymorphic nature of lod_links, leading to limitations in associating multiple Site records with the same Wikidata entity.

Current Workaround

To address this issue temporarily in production, a monkey patch was applied. The patch prevents duplicate creation by using find_or_initialize_by and skipping record creation if the lod_link already exists. This is not a long-term solution and only mitigates the problem in runtime code.

Proposed Solution

A permanent fix should be implemented in the master branch to properly handle the polymorphic nature of lod_links:

  1. Update the Unique Index:
    Modify the unique index to include linkable_type and linkable_id, ensuring that source and external_id are unique within the scope of a specific linkable record. Migration example:
class UpdateIndexOnLodLinksToPolymorphic < ActiveRecord::Migration[7.0]
  def change
    remove_index :lod_links, name: "index_lod_links_on_source_and_external_id"

    add_index :lod_links, [:linkable_type, :linkable_id, :source, :external_id],
              unique: true, name: "index_lod_links_on_polymorphic_source_and_external_id"
  end
end

Adjust Model Validations:
Ensure that the model reflects this uniqueness at the application level:

validates :source, uniqueness: { scope: [:external_id, :linkable_type, :linkable_id] }

  1. Refactor Batch Processing Logic:
    Update the wikidata_match_candidates_batch method to handle these polymorphic constraints gracefully and avoid creating unnecessary duplicates:
def self.wikidata_match_candidates_batch(sites)
  ActiveRecord::Base.transaction do
    wikidata_results.each do |site_name, matches|
      sites_for_name = sites.select { |s| s.name == site_name }
      matches.each do |match|
        lod_link = LodLink.find_or_create_by!(
          source: "Wikidata",
          external_id: match.qid,
          linkable_type: "Site",
          linkable_id: site.id
        ) do |link|
          link.data = { label: match.label, description: match.description }
        end

        sites_for_name.each do |site|
          site.lod_links << lod_link unless site.lod_links.include?(lod_link)
        end
      end
    end
  end
end

Action Items

  1. Update the lod_links schema to include a polymorphic index.
  2. Refactor the wikidata_match_candidates_batch method to align with the updated schema.
  3. Remove the monkey-patch from production once the changes are deployed to master.

Priority

High: The current issue causes runtime errors in production and relies on a temporary workaround. Proper resolution is needed to ensure database integrity and prevent duplicate errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant