-
Notifications
You must be signed in to change notification settings - Fork 2
data sources & queries
generate2vivo currently includes queries for metadata from Datacite Commons, ORCID and ROR.
Datacite Commons is an interface to a so-called PID graph, that is a structure connecting different objects and each object is identified by a persistent identifier. In the case of Datacite Commons these objects and identifiers are organisation identified by its ROR, person identified by its ORCID and publication, dataset, software, funder identified by their DOI. The PID graph is queried via its GraphQL API. The interesting objects and connections in this graph that we chose for the VIVO data ingest are organization, person and publications.
- you may query organization, person or work individually
- or query one connection and the linked objects via organizationPlusPeople or personPlusPublications
- or query organizationPlusPeoplePlusPublications altogether
-
organization - person:
- connected via ORCID affiliation: queries ORCID API with organization's grid-id and Ringgold-id to get affiliated people (ORCID:how-do-i-find-orcid-record-holders-at-my-institution)
- affiliation is a broad category : organization can be listed in person's ORCID profile under employment, education & qualifications, membership & service, invited positions & distinctions (ORCID:working-with-organization-identifiers)
- there is no time limit on affiliation (even people who worked/studied/etc at the organization 5 years ago will be listed in the results
- → it is not possible to filter affiliated people for current employees only, see explanations at https://www.pidforum.org/t/employment-field-always-empty-when-using-connection/1571
- → between organization and person a "Position" will be generated to connect them with role="Unknown" since the role title can not be determined (or if they are connected via employment at all)
-
person - publication:
- connection is made via
creator
field in publication's metadata, only ifnameIdentifier
contains ORCID id - Datacite Commons currently includes 100% of DOIs issued by Datacite, but only ~8% of DOIs issued by CrossRef (as of April 22, 2021), so you may query a person that has hundreds of publications but from Datacite Commons you get only 4 because only 4 were issued by Datacite (on-going effort to build common search index, number will improve in the future)
- some versions can be filtered, if they are indicated as version in their metadata, but not all are marked this way
- connection is made via
While ORCID is 100% included in Datacite Commons, querying ORCID directly allows to specify the connections used to link an organization and its affiliated people as well as a person and their works.
- personPlusWorks
- currentEmployeesPlusWorks
-
organization - person:
- connected via ORCID affiliation: queries ORCID API with organization's grid-id and Ringgold-id to get affiliated people (ORCID:how-do-i-find-orcid-record-holders-at-my-institution)
- after getting all affiliated people, we go through each ORCID, query the profile and filter only people who have the grid-id or Ringgold-id in their employment section + end-date of the employment is empty (=current employees)
- → please note: since ORCID does not contain organizational data, there will be only an organization placeholder with name and ROR-id generated (organization data can be completed by querying ROR organization before or afterwards)
- → a person will be connected to the organization via a "Position" with the role title taken from the ORCID employment field
-
person - publication:
- all works listed in the ORCID profile independent of who issued the DOI.
- if publication is marked as version in ORCID profile (stacked view), filtering them is easy (preferred source always on top position)
While ROR is 100% included in Datacite Commons, in April of 2021 they announced a new relationship attribute in their metadata schema, which links different organizations. One type of relationship available is the child-parent-relationship that can be used to import an organization and the complete hierarchy of sub-organiations below it. This property is not yet included in Datacite Commons that is why ROR is queried directly.
- organization
- organizationPlusChildren
For Crossref there are two queries available: one where you enter a DOI and get the available metadata for a published work and another one where you enter an ORCID and get all available works where the ORCID is listed in the work's author section.
- personPlusWorks
- work