Skip to content

data sources & queries

Sandra Mierz edited this page Nov 5, 2021 · 8 revisions

data sources & queries

generate2vivo currently includes queries for metadata from Datacite Commons, ORCID and ROR.

Datacite Commons

Datacite Commons is an interface to a so-called PID graph, that is a structure connecting different objects and each object is identified by a persistent identifier. In the case of Datacite Commons these objects and identifiers are organisation identified by its ROR, person identified by its ORCID and publication, dataset, software, funder identified by their DOI. The PID graph is queried via its GraphQL API. The interesting objects and connections in this graph that we chose for the VIVO data ingest are organization, person and publications.

Available queries for Datacite Commons:

  • you may query organization, person or work individually
  • or query one connection and the linked objects via organizationPlusPeople or personPlusPublications
  • or query organizationPlusPeoplePlusPublications altogether

Connections in Datacite Commons:

  • organization - person:

    • connected via ORCID affiliation: queries ORCID API with organization's grid-id and Ringgold-id to get affiliated people (ORCID:how-do-i-find-orcid-record-holders-at-my-institution)
    • affiliation is a broad category : organization can be listed in person's ORCID profile under employment, education & qualifications, membership & service, invited positions & distinctions (ORCID:working-with-organization-identifiers)
    • there is no time limit on affiliation (even people who worked/studied/etc at the organization 5 years ago will be listed in the results
  • person - publication:

    • connection is made via creator field in publication's metadata, only if nameIdentifier contains ORCID id
    • Datacite Commons currently includes 100% of DOIs issued by Datacite, but only ~8% of DOIs issued by CrossRef (as of April 22, 2021), so you may query a person that has hundreds of publications but from Datacite Commons you get only 4 because only 4 were issued by Datacite (on-going effort to build common search index, number will improve in the future)
    • some versions can be filtered, if they are indicated as version in their metadata, but not all are marked this way

ORCID

While ORCID is 100% included in Datacite Commons, querying ORCID directly allows to specify the connections used to link an organization and its affiliated people as well as a person and their works.

Available queries for ORCID:

  • personPlusWorks
  • currentEmployeesPlusWorks

Connections in ORCID:

  • organization - person:

    • connected via ORCID affiliation: queries ORCID API with organization's grid-id and Ringgold-id to get affiliated people (ORCID:how-do-i-find-orcid-record-holders-at-my-institution)
    • after getting all affiliated people, we go through each ORCID, query the profile and filter only people who have the grid-id or Ringgold-id in their employment section + end-date of the employment is empty (=current employees)
      • please note: since ORCID does not contain organizational data, there will be only an organization placeholder with name and ROR-id generated (organization data can be completed by querying ROR organization before or afterwards)
      • a person will be connected to the organization via a "Position" with the role title taken from the ORCID employment field
  • person - publication:

    • all works listed in the ORCID profile independent of who issued the DOI.
    • if publication is marked as version in ORCID profile (stacked view), filtering them is easy (preferred source always on top position)

Research Organization Registry (ROR)

While ROR is 100% included in Datacite Commons, in April of 2021 they announced a new relationship attribute in their metadata schema, which links different organizations. One type of relationship available is the child-parent-relationship that can be used to import an organization and the complete hierarchy of sub-organiations below it. This property is not yet included in Datacite Commons that is why ROR is queried directly.

Available queries for ROR:

  • organization
  • organizationPlusChildren


Crossref

For Crossref there are two queries available: one where you enter a DOI and get the available metadata for a published work and another one where you enter an ORCID and get all available works where the ORCID is listed in the work's author section.

Available queries for Crossref:

  • personPlusWorks
  • work