Skip to content

Latest commit

 

History

History
225 lines (161 loc) · 14.1 KB

data-correction.md

File metadata and controls

225 lines (161 loc) · 14.1 KB

Data Correction

Scenario 1: A repository manager of a repository indexed in OpenAIRE can subscribe the event for Missed/More PIDs and Project links in the Content Provider Dashboard using “a repository callback” as notification mechanism instead of the current email alert. They login in the repository and see the list of events received, among others one publication that has a PMID that was unknown to the repository and a link to a project. They click on the “accept the suggestion” button and the new information is stored in the local record. OpenAIRE could “flag” the data as confirmed.

The goal of the Data Correction service is to support the scenario above.

As the OpenAIRE Content Provider Dashboard doesn't allow yet to create a subscription setting up a callback mechanism, we agree with the OpenAIRE team to read the data generated by openAIRE's Notification Broker Service from a JSON file postponing to the last phase of the project the discussion and the implementation about the delivery mechanism (polling new versions from a stable URL, receive it as payload of a repository URL, etc.).

Data source

The JSON file contains an array of JSON Events, where each event has the following structure

    {
        "originalId": "oai:www.openstarts.units.it:10077/21838",
        "title": "Egypt, crossroad of translations and literary interweavings (3rd-6th centuries). A reconsideration of earlier Coptic literature",
        "topic": "ENRICH/MORE/PROJECT",
        "trust": 1.0,
        "message": {
            "projects[0].acronym": "PAThs",
            "projects[0].code": "687567",
            "projects[0].funder": "EC",
            "projects[0].fundingProgram": "H2020",
            "projects[0].jurisdiction": "EU",
            "projects[0].openaireId": "40|corda__h2020::6e32f5eb912688f2424c68b851483ea4",
            "projects[0].title": "Tracking Papyrus and Parchment Paths: An Archaeological Atlas of Coptic Literature.\nLiterary Texts in their Geographical Context: Production, Copying, Usage, Dissemination and Storage"
        }
    }

please note that the message sub-object depends on the event TOPIC. A more complete set of sample events can be seen here: nbevents-sample.json

The java class org.dspace.app.nbevent.NBEventsRunnableCli provides a convenient method to process this json file loading the data in a dedicated new DSpace SOLR Core named nbevent, to use it run from the dspace installation bin folder

./dspace import-nbevents -f <path-to-the-json-file>

the same script is also available via the administrative runnable process UI

notification broker load script UI

The config/modules/oaire-nbevents.cfg file allows to configure witch Topic should be processed, indeed some Topics could have no configured action on the repository

import.topic = ENRICH/MISSING/ABSTRACT
import.topic = ENRICH/MORE/PID
import.topic = ...

and a list of URLs to acknowledge the decision made by the Repository Manager via the DSpace UI

oaire-nbevents.acknowledge-url = https://httpdump.io/...

Such configuration file is also expected in future to hold settings related to the delivery mechanism (such as the URL from where the json file can be download, the credentials to use, etc.)

The nbevent core has the following structure

<fields>
    <field name="event_id" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="original_id" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="title" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="topic" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="trust" type="double" indexed="true" stored="true" omitNorms="true" />
    <field name="message" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="resource_uuid" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="related_uuid" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="last_update" type="date" indexed="true" stored="true" omitNorms="true" />
</fields>    
<uniqueKey>event_id</uniqueKey>

the event_id is currently generated on the repository side as an hash of the business information included in the event itself but it is envisioned that such information will be made available by openAIRE directly in the json file so that feedback from the Repository can linked back to the original event and further processed.

The related_uuid field contains the uuid of the related object that has been associated with the correction suggestion, this is the case for the PROJECT related TOPICS where a link between the publication and a project should be established. In the case the suggested project can be found in the system, the related_uuid field will hold its internal identifier otherwise the user will be allowed to created on the fly a new item also for the project and connect it to the publication item with a single click.

Two REST endpoints have been developed to expose the data so collected

  • /api/integration/nbtopics to provide access to summary information about the available topic and number of events to deal with
  • /api/integration/nbevents to provide access to the detailed events so that they can be reviewed and managed by the repository manager

The detailed REST contract for such endpoints are available on the 4Science Rest7Contract repository and embedded at the bottom of the page for easy reference.

Repository Manager UI

The resulting UI is accessible from the administrative menu. As entry point for the features a “Notifications” menu entry has been added to the DSpace administrative menu, from where the repository manager will be able to manage the OpenAIRE subscription and access the details of received events.

menu

The main page list the topics found in the events loaded in the system

notification broker topics

By default the system sort the events within a topic by trust descending (most accurate correction first)

notification broker events sorted by trust DESC

but it is also possible to revert the direction

notification broker sorting events

getting the less accurate correction first

notification broker events sorted by trust ASC

In the detail view of events in a specific topic links always open in a new tab so that the repository manager can quickly check the details without loosing the overview

notification broker links to details about target and source

Below a screen of possible missing abstract events, where the repository manager will be able to check the current local publication record clicking on the title and scroll the abstract reported by OpenAIRE. Accepting the suggestion, the local record will be enriched with this extra information. The Ignore suggestions button is instead intent to be used to discard a notification without flagging it as wrong. This is important because the OpenAIRE Graph process the data from the repository not in real-time so it can happen that a local record has been updated recently with information not yet known to OpenAIRE. In such scenarios it could be possible that the repository manager prefers to keep the local version but this should be not reported to OpenAIRE a wrong suggestion as this feedback can be used to improve the OpenAIRE guessing capabilities. In contrast a wrong suggestion should be rejected so that OpenAIRE can learn from that.

notification broker simple metadata event

For PROJECT related events, alternative additional actions are needed. This is usually the case for information that is related to linked entities that can be tracked on the local repository as flat metadata (in such case the “abstract approach UI” will be used) or as individual entity. In this later case the below screen applies:

notification broker linked entity event

The system will attempt to identify a local record for the information reported by OpenAIRE (the project) and will offer to the repository manager the option to manually lookup the record or fix the automatic match

notification broker linked entity event search

notification broker linked entity event select

if the related project is found in the system the repository manager can proceed to accept the correction linking the publication to the local copy of the project otherwise it is possible to import and connect the project in one click as shown in the first project related screen above

notification broker linked entity event select

For PID related events, the system offers where available (doi, handle, pmid, pmc, arXiv, NCID, urn/url) the resolution of the identifier to a details page

notification broker linked doi

notification broker linked doi

Processing the decisions

The backend is responsible to process the repository manager decisions taken over the received events. As noted in the REST Contract the decision is recorded PATCHing the DSpace nbevents REST resource updating its status.

If one or more acknowledge-url are configured in the oaire-nbevents.cfg configuration file a POST call to each URL with the following JSON payload will be performed

{
    eventId: "<the notification broker event id>",
    status: “<STATUS-ENUM>”
}

where the status ENUM could have the following values:

  • DISCARDED: the event was not processable by the system. OpenAIRE should not interpret such status in a negative or positive sense with regard to the accuracy of the notification
  • REJECTED: a human takes the decision to reject the suggestion as it was wrong
  • ACCEPTED: a human takes the decision to apply the suggested enrichment to the local record

On the Repository side what is performed is encapsulated in a JAVA class specialized to deal with a specific TOPIC. The /config/spring/api/nbevents.xml spring configuration file map each TOPIC to a specific implementation

     <bean id="org.dspace.app.nbevent.NBEventActionService" class="org.dspace.app.nbevent.NBEventActionServiceImpl">
        <property name="topicsToActions">
            <map>
    <!--The key are the TOPIC, the value must be a valid implementation of the 
        org.dspace.app.nbevent.NBEventAction interface -->
               <entry key="ENRICH/MORE/PROJECT" value-ref="ProjectLinkedEntityAction" />
               <entry key="ENRICH/MISSING/ABSTRACT" value-ref="AbstractMetadataAction" />
               <entry key="ENRICH/MORE/PID" value-ref="PIDMetadataAction" />
            </map>   
        </property>
     </bean>

each implementation allows to configure additional parameters to deal with the event as needed, ranging from the simple definition of the metadata to use to save the information as in the case of the Abstract related events

     <bean id="AbstractMetadataAction" class="org.dspace.app.nbevent.NBSimpleMetadataAction">
        <property name="metadata" value="dc.description.abstract" />
     </bean>

to a dynamic mapping used for SUBJECT and PID related events

     <bean id="PIDMetadataAction" class="org.dspace.app.nbevent.NBMetadataMapAction">
         <property name="types">
             <map>
    <!--The key are the type of identifier (or subject) reported in the message, the value is the metadata in 
        the linked entity where the information should be stored -->
               <entry key="default" value="dc.identifier" />
               <entry key="doi" value="dc.identifier.doi" />          
             </map>
         </property>    
     </bean>

to the definition of the metadata used in linked entity for Project related events

 <bean id="ProjectLinkedEntityAction" class="org.dspace.app.nbevent.NBEntityMetadataAction">
        <!-- which metadata will hold the relation between the publication and the project -->
        <property name="metadata" value="dc.relation.funding" />
        <!-- the type of local entity used to store the project details -->
        <property name="entityType" value="funding" />
        <property name="entityMetadata">
            <map>
    <!--The key are the json path of nb message, the value is the metadata in 
        the linked entity where the information should be stored -->
               <entry key="acronym" value="oairecerif.acronym" />
               <entry key="code" value="oairecerif.internalid" />
               <entry key="funder" value="oairecerif.funder" />
               <entry key="title" value="dc.title" />
               <entry key="fundingProgram" value="oairecerif.fundingProgram" />
               <entry key="openaireId" value="oairecerif.funding.identifier" />
            </map>
        </property>    
     </bean>

the above configuration is the default for DSpace-CRIS, for a plain DSpace the NBEntityMetadataAction bean must define the attribute relation instead than metadata. The relation must match the rightward name of the relation used to link a publication with a project that is isPublicationOfProject in the openaire4-relatioship.xml proposal.

Rest Contract

Two REST endpoints have been developed to interact with the notification broker events

  • /api/integration/nbtopics to provide access to summary information about the available topic and number of events to deal with
  • /api/integration/nbevents to provide access to the detailed events so that they can be reviewed and managed by the repository manager

/api/integration/nbtopics

nbtopics endpoint

/api/integration/nbevents

nbevents endpoint