Scenario 1: A repository manager of a repository indexed in OpenAIRE can subscribe the event for Missed/More PIDs and Project links in the Content Provider Dashboard using “a repository callback” as notification mechanism instead of the current email alert. They login in the repository and see the list of events received, among others one publication that has a PMID that was unknown to the repository and a link to a project. They click on the “accept the suggestion” button and the new information is stored in the local record. OpenAIRE could “flag” the data as confirmed.
The goal of the Data Correction service is to support the scenario above.
As the OpenAIRE Content Provider Dashboard doesn't allow yet to create a subscription setting up a callback mechanism, we agree with the OpenAIRE team to read the data generated by openAIRE's Notification Broker Service from a JSON file postponing to the last phase of the project the discussion and the implementation about the delivery mechanism (polling new versions from a stable URL, receive it as payload of a repository URL, etc.).
The JSON file contains an array of JSON Events, where each event has the following structure
{
"originalId": "oai:www.openstarts.units.it:10077/21838",
"title": "Egypt, crossroad of translations and literary interweavings (3rd-6th centuries). A reconsideration of earlier Coptic literature",
"topic": "ENRICH/MORE/PROJECT",
"trust": 1.0,
"message": {
"projects[0].acronym": "PAThs",
"projects[0].code": "687567",
"projects[0].funder": "EC",
"projects[0].fundingProgram": "H2020",
"projects[0].jurisdiction": "EU",
"projects[0].openaireId": "40|corda__h2020::6e32f5eb912688f2424c68b851483ea4",
"projects[0].title": "Tracking Papyrus and Parchment Paths: An Archaeological Atlas of Coptic Literature.\nLiterary Texts in their Geographical Context: Production, Copying, Usage, Dissemination and Storage"
}
}
please note that the message sub-object depends on the event TOPIC. A more complete set of sample events can be seen here: nbevents-sample.json
The java class org.dspace.app.nbevent.NBEventsRunnableCli
provides a convenient method to process this json file loading the data in a dedicated new DSpace SOLR Core named nbevent, to use it run from the dspace installation bin folder
./dspace import-nbevents -f <path-to-the-json-file>
the same script is also available via the administrative runnable process UI
The config/modules/oaire-nbevents.cfg
file allows to configure witch Topic should be processed, indeed some Topics could have no configured action on the repository
import.topic = ENRICH/MISSING/ABSTRACT
import.topic = ENRICH/MORE/PID
import.topic = ...
and a list of URLs to acknowledge the decision made by the Repository Manager via the DSpace UI
oaire-nbevents.acknowledge-url = https://httpdump.io/...
Such configuration file is also expected in future to hold settings related to the delivery mechanism (such as the URL from where the json file can be download, the credentials to use, etc.)
The nbevent core has the following structure
<fields>
<field name="event_id" type="string" indexed="true" stored="true" omitNorms="true" />
<field name="original_id" type="string" indexed="true" stored="true" omitNorms="true" />
<field name="title" type="string" indexed="true" stored="true" omitNorms="true" />
<field name="topic" type="string" indexed="true" stored="true" omitNorms="true" />
<field name="trust" type="double" indexed="true" stored="true" omitNorms="true" />
<field name="message" type="string" indexed="true" stored="true" omitNorms="true" />
<field name="resource_uuid" type="string" indexed="true" stored="true" omitNorms="true" />
<field name="related_uuid" type="string" indexed="true" stored="true" omitNorms="true" />
<field name="last_update" type="date" indexed="true" stored="true" omitNorms="true" />
</fields>
<uniqueKey>event_id</uniqueKey>
the event_id
is currently generated on the repository side as an hash of the business information included in the event itself but it is envisioned that such information will be made available by openAIRE directly in the json file so that feedback from the Repository can linked back to the original event and further processed.
The related_uuid
field contains the uuid of the related object that has been associated with the correction suggestion, this is the case for the PROJECT related TOPICS where a link between the publication and a project should be established. In the case the suggested project can be found in the system, the related_uuid
field will hold its internal identifier otherwise the user will be allowed to created on the fly a new item also for the project and connect it to the publication item with a single click.
Two REST endpoints have been developed to expose the data so collected
/api/integration/nbtopics
to provide access to summary information about the available topic and number of events to deal with/api/integration/nbevents
to provide access to the detailed events so that they can be reviewed and managed by the repository manager
The detailed REST contract for such endpoints are available on the 4Science Rest7Contract repository and embedded at the bottom of the page for easy reference.
The resulting UI is accessible from the administrative menu. As entry point for the features a “Notifications” menu entry has been added to the DSpace administrative menu, from where the repository manager will be able to manage the OpenAIRE subscription and access the details of received events.
The main page list the topics found in the events loaded in the system
By default the system sort the events within a topic by trust descending (most accurate correction first)
but it is also possible to revert the direction
getting the less accurate correction first
In the detail view of events in a specific topic links always open in a new tab so that the repository manager can quickly check the details without loosing the overview
Below a screen of possible missing abstract events, where the repository manager will be able to check the current local publication record clicking on the title and scroll the abstract reported by OpenAIRE. Accepting the suggestion, the local record will be enriched with this extra information. The Ignore suggestions button is instead intent to be used to discard a notification without flagging it as wrong. This is important because the OpenAIRE Graph process the data from the repository not in real-time so it can happen that a local record has been updated recently with information not yet known to OpenAIRE. In such scenarios it could be possible that the repository manager prefers to keep the local version but this should be not reported to OpenAIRE a wrong suggestion as this feedback can be used to improve the OpenAIRE guessing capabilities. In contrast a wrong suggestion should be rejected so that OpenAIRE can learn from that.
For PROJECT related events, alternative additional actions are needed. This is usually the case for information that is related to linked entities that can be tracked on the local repository as flat metadata (in such case the “abstract approach UI” will be used) or as individual entity. In this later case the below screen applies:
The system will attempt to identify a local record for the information reported by OpenAIRE (the project) and will offer to the repository manager the option to manually lookup the record or fix the automatic match
if the related project is found in the system the repository manager can proceed to accept the correction linking the publication to the local copy of the project otherwise it is possible to import and connect the project in one click as shown in the first project related screen above
For PID related events, the system offers where available (doi, handle, pmid, pmc, arXiv, NCID, urn/url) the resolution of the identifier to a details page
The backend is responsible to process the repository manager decisions taken over the received events. As noted in the REST Contract the decision is recorded PATCHing the DSpace nbevents REST resource updating its status.
If one or more acknowledge-url
are configured in the oaire-nbevents.cfg
configuration file a POST call to each URL with the following JSON payload will be performed
{
eventId: "<the notification broker event id>",
status: “<STATUS-ENUM>”
}
where the status ENUM could have the following values:
- DISCARDED: the event was not processable by the system. OpenAIRE should not interpret such status in a negative or positive sense with regard to the accuracy of the notification
- REJECTED: a human takes the decision to reject the suggestion as it was wrong
- ACCEPTED: a human takes the decision to apply the suggested enrichment to the local record
On the Repository side what is performed is encapsulated in a JAVA class specialized to deal with a specific TOPIC. The /config/spring/api/nbevents.xml
spring configuration file map each TOPIC to a specific implementation
<bean id="org.dspace.app.nbevent.NBEventActionService" class="org.dspace.app.nbevent.NBEventActionServiceImpl">
<property name="topicsToActions">
<map>
<!--The key are the TOPIC, the value must be a valid implementation of the
org.dspace.app.nbevent.NBEventAction interface -->
<entry key="ENRICH/MORE/PROJECT" value-ref="ProjectLinkedEntityAction" />
<entry key="ENRICH/MISSING/ABSTRACT" value-ref="AbstractMetadataAction" />
<entry key="ENRICH/MORE/PID" value-ref="PIDMetadataAction" />
</map>
</property>
</bean>
each implementation allows to configure additional parameters to deal with the event as needed, ranging from the simple definition of the metadata to use to save the information as in the case of the Abstract related events
<bean id="AbstractMetadataAction" class="org.dspace.app.nbevent.NBSimpleMetadataAction">
<property name="metadata" value="dc.description.abstract" />
</bean>
to a dynamic mapping used for SUBJECT and PID related events
<bean id="PIDMetadataAction" class="org.dspace.app.nbevent.NBMetadataMapAction">
<property name="types">
<map>
<!--The key are the type of identifier (or subject) reported in the message, the value is the metadata in
the linked entity where the information should be stored -->
<entry key="default" value="dc.identifier" />
<entry key="doi" value="dc.identifier.doi" />
</map>
</property>
</bean>
to the definition of the metadata used in linked entity for Project related events
<bean id="ProjectLinkedEntityAction" class="org.dspace.app.nbevent.NBEntityMetadataAction">
<!-- which metadata will hold the relation between the publication and the project -->
<property name="metadata" value="dc.relation.funding" />
<!-- the type of local entity used to store the project details -->
<property name="entityType" value="funding" />
<property name="entityMetadata">
<map>
<!--The key are the json path of nb message, the value is the metadata in
the linked entity where the information should be stored -->
<entry key="acronym" value="oairecerif.acronym" />
<entry key="code" value="oairecerif.internalid" />
<entry key="funder" value="oairecerif.funder" />
<entry key="title" value="dc.title" />
<entry key="fundingProgram" value="oairecerif.fundingProgram" />
<entry key="openaireId" value="oairecerif.funding.identifier" />
</map>
</property>
</bean>
the above configuration is the default for DSpace-CRIS, for a plain DSpace the NBEntityMetadataAction bean must define the attribute relation instead than metadata. The relation must match the rightward name of the relation used to link a publication with a project that is
isPublicationOfProject
in the openaire4-relatioship.xml proposal.
Two REST endpoints have been developed to interact with the notification broker events
/api/integration/nbtopics
to provide access to summary information about the available topic and number of events to deal with/api/integration/nbevents
to provide access to the detailed events so that they can be reviewed and managed by the repository manager