Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate schema_version based on namespace of current_metadata #1182

Merged
merged 3 commits into from
Apr 22, 2024

Conversation

codycooperross
Copy link
Contributor

@codycooperross codycooperross commented Apr 19, 2024

Purpose

Currently, schema_version is sometimes not populated, and users can send arbitrary values in a schemaVersion attribute that are then written to the column. This confuses what schema version a given DOI is using, causing issues with reporting as well as with display in the REST API, in Fabrica, and elsewhere. See the issues listed below. This PR intends to use the XML namespace of the most current metadata object to write schema_version deterministically upon saving a record.

closes: #1179 datacite/datacite#1855

Approach

Adds a before_save callback in Doi model to retrieve current_metadata.namespace and set schema_version to the result. Once validated, the Metadata model namespace contains the current schema version via the set_namespace method:

def set_namespace
return nil if xml.blank?
doc = Nokogiri.XML(xml, nil, "UTF-8", &:noblanks)
ns =
doc.collect_namespaces.detect do |_k, v|
v.start_with?("http://datacite.org/schema/kernel")
end
self.namespace = Array.wrap(ns).last
end

This method retrieves the xmlns value of the XML, which by nature must contain a valid namespace value. In other words, inaccurate values and sub-version values like http://datacite.org/schema/kernel-4.5 will not validate if in the xmlns value of the source XML and thus cannot appear as the saved schema_version value.

Open Questions and Pre-Merge TODOs

Learning

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

@@ -3690,7 +3702,6 @@
"type" => "dois",
"attributes" => {
"schemaVersion" => "http://datacite.org/schema/kernel-4",
"regenerate" => true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently advise that users send a request with "schemaVersion":"http://datacite.org/schema/kernel-4" to trigger an update to Schema 4 here: https://support.datacite.org/docs/updating-from-schema-3-to-schema-4#include-schemaversion-to-trigger-the-update-to-schema-4 In my experience, this works, and should work without the "regenerate" attribute, so I removed it from the test.

@codycooperross codycooperross requested a review from a team April 19, 2024 15:09
@codycooperross codycooperross merged commit 3ea3d23 into master Apr 22, 2024
13 checks passed
@codycooperross codycooperross deleted the codycooperross/issue1179 branch April 22, 2024 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

schemaVersion is not populated when a DOI is created via REST API via JSON
2 participants