-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #19 from Project-TAPIR/develop
* include notebooks for OpenAIRE person-works and work-projects * include ROR ID in ORCID search * update docs
- Loading branch information
Showing
8 changed files
with
444 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Query OpenAIRE for publications authored by a person\n", | ||
"This notebook queries the [OpenAIRE HTTP API](https://graph.openaire.eu/develop/api.html) via its `/publications` endpoint for publications authored by a person. It takes an ORCID iD as input which is used to filter for publications where one of the creators' `orcid` field matches the given ORCID iD. From the resulting list of publications we output all DOIs.\n", | ||
"\n", | ||
"*Note:\n", | ||
"The API has several different endpoints for research outputs: they are divided into publications, research data, software metadata and other research products, so to get a full picture about a person's research output, you would have to query all of these endpoints and union their results.*" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": { | ||
"pycharm": { | ||
"name": "#%%\n" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Prerequisites:\n", | ||
"import requests # dependency for making HTTP calls\n", | ||
"from benedict import benedict # dependency for dealing with json" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": true, | ||
"pycharm": { | ||
"name": "#%% md\n" | ||
} | ||
}, | ||
"source": [ | ||
"The input for this notebook is an ORCID iD, e.g. '`0000-0003-2499-7741`'." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": { | ||
"pycharm": { | ||
"name": "#%%\n" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# input parameter\n", | ||
"example_orcid_id=\"0000-0003-2499-7741\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We use it to query the OpenAIRE HTTP API for publications that specified the ORCID iD within their metadata in one of the creators `orcid` field. Since the API uses pagination, we need to loop through all pages to get the complete result set." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": { | ||
"pycharm": { | ||
"name": "#%%\n" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# OpenAIRE endpoint to query for publications\n", | ||
"OPENAIRE_API_PUBLICATIONS = \"https://api.openaire.eu/search/publications\"\n", | ||
"\n", | ||
"# query OpenAIRE for all publications that are connected to orcid\n", | ||
"def query_openaire_for_person2publications(orcid_id):\n", | ||
" page = 1\n", | ||
" max_page = 1\n", | ||
"\n", | ||
" while page <= max_page:\n", | ||
" params = {'orcid': orcid_id, 'page': page, 'format': \"json\"}\n", | ||
" response = requests.get(url=OPENAIRE_API_PUBLICATIONS,\n", | ||
" params=params)\n", | ||
" response.raise_for_status()\n", | ||
" result=response.json()\n", | ||
"\n", | ||
" # calculate max page number in first loop\n", | ||
" if max_page == 1:\n", | ||
" max_page = determine_max_page(result)\n", | ||
" page = page + 1\n", | ||
" yield result\n", | ||
"\n", | ||
"# calculate max number of result pages\n", | ||
"def determine_max_page(response_data):\n", | ||
" response_dict = benedict.from_json(response_data)\n", | ||
" items_total = response_dict.get('response.header.total.$')\n", | ||
" items_per_page = response_dict.get('response.header.size.$')\n", | ||
" max_page_ceil = items_total // items_per_page + bool(items_total % items_per_page)\n", | ||
" return max_page_ceil\n", | ||
"\n", | ||
"\n", | ||
"# ---- example execution\n", | ||
"list_of_pages=query_openaire_for_person2publications(example_orcid_id)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"From the resulting list of publications we extract and print out each title and DOI. \n", | ||
"\n", | ||
"*Note: publications that do not have a DOI assigned, will not be printed.*" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Number of publications found: 6\n", | ||
"\n", | ||
"10.15488/11463, Roadmap to FAIR Research Information in Open Infrastructures\n", | ||
"10.1515/bd.2006.40.4.466, Informationsvermittlung: Personalisiertes Lernen in der Bibliothek: das Düsseldorfer Online-Tutorial (DOT) Informationskompetenz\n", | ||
"10.1080/00048623.2006.10755322, Teaching Information Literacy with the Lerninformationssystem\n", | ||
"10.3389/frma.2021.694307, Enhancing Knowledge Graph Extraction and Validation From Scholarly Publications Using Bibliographic Metadata\n", | ||
"10.3897/rio.7.e66264, OPTIMETA – Strengthening the Open Access publishing system through open citations and spatiotemporal metadata\n", | ||
"10.1016/j.procs.2019.01.074, The Research Core Dataset (KDSF) in the Linked Data context\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# from the result pages, extract the data about each publication\n", | ||
"def extract_publications_from_page(page):\n", | ||
" return [pub for pub in benedict.from_json(page).get('response.results.result') or []]\n", | ||
"\n", | ||
"# extract DOI from publication\n", | ||
"def extract_doi(pub):\n", | ||
" oaf_result=benedict.from_json(pub).get('metadata.oaf:entity.oaf:result')\n", | ||
"\n", | ||
" # unfortunately the json data is inconsistently modeled:\n", | ||
" # if there is one pid/title for a publication, it is a json object\n", | ||
" # if there are multiple pids/titles for a publication, they form a json list\n", | ||
" pids=oaf_result.get('pid') or []\n", | ||
" is_doi = lambda pid: pid.get('@classid')==\"doi\"\n", | ||
" if isinstance(pids, list):\n", | ||
" dois=[pid['$'] for pid in pids if is_doi(pid)]\n", | ||
" else:\n", | ||
" dois= [pids['$']] if is_doi(pids) else []\n", | ||
" doi=dois[0] if dois else None # pick the first one\n", | ||
" \n", | ||
" titles=oaf_result.get('title') or []\n", | ||
" is_main_title = lambda title: title.get('@classid')==\"main title\"\n", | ||
" if isinstance(titles, list):\n", | ||
" main_titles=[title['$'] for title in titles if is_main_title(title)]\n", | ||
" else:\n", | ||
" main_titles=[titles['$']] if is_main_title(titles) else []\n", | ||
" title=main_titles[0] if main_titles else None # pick the first one\n", | ||
"\n", | ||
" return doi, title\n", | ||
"\n", | ||
"\n", | ||
"#--- example execution\n", | ||
"for page in list_of_pages or []:\n", | ||
" publications=extract_publications_from_page(page)\n", | ||
" print(f\"Number of publications found: {len(publications)}\\n\")\n", | ||
" for pub in publications:\n", | ||
" doi,title = extract_doi(pub)\n", | ||
" if doi:\n", | ||
" print(f\"{doi}, {title}\")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 1 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
## work-projects | ||
|
||
A Jupyter notebook showing an example of using a persistent identifier for a publication (DOI) | ||
as input for retrieving the project a work was produced in (identified by its OpenAIRE project ID). | ||
|
||
* [OpenAIRE](https://www.openaire.eu/) |
Oops, something went wrong.