The scripts below were used in the case study to assess use of the Human Disease Ontology in the publication:
J. Allen Baron, Lynn M Schriml, Assessing resource use: a case study with the Human Disease Ontology, Database, Volume 2023, 2023, baad007. PMID:36856688, https://doi.org/10.1093/database/baad007.
Scripts are found in the scripts/
directory and listed below under the section of the case study they apply to, with a brief description. Later in this document a description of what each script accomplished and its file inputs and outputs are given.
- citedby_full_procedure.R: used to obtain "cited by" and MyNCBI collection records.
- pub_searches.R: used to obtain and analyze search records.
- NOTE: Also produced figures comparing different search phrases at Europe PMC (Fig. 3) and comparing searches across sources (Fig. 4) used in the "Obtaining Records" portion of the case study.
No scripts were used in curation. Instead, refer to the DO_uses-published_2022 Google Sheet to review curated information.
- NOTE: The publications citing the DO by year figure produced by this script (Fig. 1) was used in the introduction of the publication.
-
citedby_manual_count_plot.R
- Purpose: Generate plots showing the total publications citing official Disease Ontology publications from various providers.
- Input:
- data/citedby/counts/citedby_counts-manual.csv
- Output:
- data/citedby/counts/citedby_counts-manual_smry.csv
- graphics/citedby/citedby_manual_comparison_2021.png: Figure 2
- graphics/citedby/citedby_manual_comparison_2022.png
-
citedby_full_procedure.R
- Purpose: Obtain publication records from PubMed and Scopus that cite one or more of the official DO publications (referred to as "cited by"), load DO's MyNCBI collection, merge all of the records from these sources together, and append results to existing DO_uses-published_2022 Google Sheet.
- Input:
- data/citedby/collection.txt: 2022-11-17 download of DO's MyNCBI collection.
- Output:
- data/citedby/do_cb_pm_summary_by_id.rda: full, untidied PubMed "cited by" results, obtained 2022-11-10.
- data/citedby/do_cb_scop_by_id.rda: full, untidied Scopus "cited by" results, obtained 2022-11-10.
- data/citedby/do_collection_pm_summary.rda: full, untidied MyNCBI collection publication summary results from PubMed, obtained 2022-11-10.
- data/citedby/do_collection_pmc_summary.rda: full, untidied MyNCBI collection publication summary results from PubMed Central (PMC), obtained 2022-11-10.
- data/citedby/DO_citedby.csv: merged and tidied citations from all sources.
-
pub_searches.R
- Purpose: Obtain publication records via search against PubMed, PubMed Central (PMC), and Europe PMC databases that may have used the DO.
- Input: None
- Output:
- Raw search results, obtained 2022-11-09:
- Europe PMC: data/lit_search/epmc_search_raw.RData
- PubMed: data/lit_search/pubmed_search_raw.RData
- PMC: data/lit_search/pmc_search_raw.RData
- Raw results for additional identifiers, obtained 2022-11-09: 4. PubMed: data/lit_search/pm_search_raw-IDs.RData 5. PMC: data/lit_search/pmc_search_raw-IDs.RData
- Tidied search results: 6. Europe PMC: data/lit_search/epmc_search_results.csv 7. PubMed: data/lit_search/pubmed_search_results.csv 8. PMC: data/lit_search/pmc_search_results.csv
- The actual searches that each service converted a search phrase to: 9. data/lit_search/actual_search_terms.csv
- Summary data: 10. data/lit_search/search_res_n.csv 11. data/lit_search/src_comparison.csv
- Raw search results, obtained 2022-11-09:
- Graphics produced 12. graphics/lit_search/epmc_search_overlap.png 13. graphics/lit_search/epmc_search_overlap-min10.png: Figure 3 14. graphics/lit_search/pmc_search_overlap.png 15. graphics/lit_search/pm_search_overlap.png 16. graphics/lit_search/search_src_overlap-venn.png 17. graphics/lit_search/search_src_overlap-upset.png 18. graphics/lit_search/total_hits-graph.png 19. graphics/lit_search/total_hits-legend.png 20. graphics/lit_search/total_hits-complete.png: Figure 4
-
citedby_analysis.R
- Purpose: Analysis of curated information as part of the "Evaluation" step of the workflow.
- Input:
- DO_uses-published_2022 Google Sheet
- data/citedby/analysis/time_to_review.csv (has curation review time data)
- Output:
- Summary counts and statistics:
- data/citedby/analysis/MyNCBI_collection_cites-mid2021.csv
- data/citedby/analysis/citedby_source_count.csv
- data/citedby/analysis/review_time_summary.csv
- data/citedby/analysis/status_uses_count.csv: Table 2
- data/citedby/analysis/use_type_count.csv: Table 3
- data/citedby/analysis/tool_roles_count.csv: Supplementary Table 1
- data/citedby/analysis/research_area_count.csv: Supplementary Table 2
- data/citedby/analysis/disease_count.csv: Supplementary Table 3
- data/citedby/analysis/use_case_count.csv: Count of resources added to "Use Case" page on disease-ontology.org.
- Graphics 10. graphics/citedby/analysis/DO_cited_by_count.png: Figure 1 11. graphics/citedby/analysis/citedy_source_overlap-venn.png
-
citedby_RNA.R
- Purpose: Estimate use of DO in RNA research over time.
- Input: DO_uses-published_2022 Google Sheet
- Output:
- graphics/citedby/RNA_pubs_over_time.png: Figure 5
- graphics/citedby/RNA_pubs_over_time-by_type.png: Supplementary Figure 1
- graphics/citedby/RNA_pubs_over_time-by_pub_type.png
compare_citedby_search.R - Purpose: Compare "cited by" (including MyNCBI collection) and search results. - Input: 1. data/citedby/DO_citedby.csv 2. data/lit_search/src_comparison.csv 3. data/lit_search/epmc_search_results.csv - Output: 4. data/cb_search_compare/citedby_search_comparison.csv 5. data/cb_search_compare/search_uniq.csv 6. graphics/cb_search_compare/citedby_search_comparison.png 7. graphics/cb_search_compare/search_unique_pub_type.png
Three additional scripts in this repository were used to evaluate the impact of the DO outside of published literature but these impact measures, specific to the DO, are not included in the publication.
- alliance_disease_record_counts.R: count of disease records in databases belonging to the Alliance of Genome Resources.
- bioconductor_stats.R: unique IP download statistics for DO-dependent R packages 'DOSE' and 'DO.db'.
- cfde_disease_record_counts.R: count of disease records in databases belonging to the Common Fund Data Ecosystem (CFDE).
- poster_figures.R: code used to make figures for poster presented at the 2023 Biocuration conference hosted by the International Society for Biocuration.
- alliance_disease_record_counts.R
- data/alliance/disease_counts-full_by_obj.csv
- data/alliance/disease_counts-disobj_by_obj.csv
- data/alliance/disease_counts-disease_by_obj.csv
- data/alliance/disease_counts-unique_diseases.csv
- graphics/alliance_disobj_plot.png
- graphics/alliance_full_record_plot.png
- bioconductor_stats.R
- data/bioc_DO_stats.csv
- data/bioc_DO_stats-1yr-distinctIP.csv
- graphics/bioc_DO_stats-distinctIP.png
- graphics/bioc_DO_stats-1yr-distinctIP.png
- cfde_disease_record_counts.R
- data/cfde/CFDE_data-program_sample_disease.csv
- data/cfde/cfde_disease_counts.csv
- graphics/cfde_disease.png
- poster_figures.R
- Input:
- data/citedby/DO_citedby-2023_Apr.csv: update of "cited by" results collected through the beginning of April 2023 for new figure of publications citing the DO.
- data/citedby/analysis/citedby_source_count.csv
- data/lit_search/src_comparison.csv
- data/poster/research_area.csv: added groupings to file with same data (data/citedby/analysis/research_area_count.csv) more succinctly show research areas impacted by DO.
- data/poster/status_use.csv: same changes as 4.
- data/poster/tool_role.csv: same changes as 4.
- data/citedby/analysis/review_time_summary.csv
- Output:
- graphics/poster/citedby.svg
- graphics/poster/record_collection.svg
- graphics/poster/record_pie.svg
- graphics/poster/record_total.svg
- graphics/poster/research_area-mod.svg
- graphics/poster/research_area.svg
- graphics/poster/status_use.svg
- graphics/poster/tool_role.svg