R Package to support the onboarding process of new CDMs in the DARWIN EU Data Network
The DARWIN EU Coordination Center (CC) is resposonsible for building a data network to support EMA and stakeholders to answer regulatory research questions. To support the onboarding process of data sources, the CdmOnboarding R package will generate an onboarding document that is used by the CC and EMA to assess the quality and readiness of the CDM for participating in regulatory studies.
The goal of the onboarding report is to provide insight into the completeness, transparency and quality of the performed Extraction Transform, and Load (ETL) process and the readiness of the data partner to be onboarded in the DARWIN EU® data network and participate in research studies.
An example of an onboarding report for a OMOP Synthea database can be found in extras/CdmOnboarding-Synthea.docx.
Main repository on DARWIN-EU/CdmOnboarding.
The CdmOnboarding R Package performs the following checks on top of the required Data Quality Dashboard step.
- Extraction of the CDM Source table
- The number of records and persons per OMOP table
- Achilles data density plots are inserted
- For each domain, the distinct concepts per person
- Observation Period length
- Type concepts
- Date ranges per domain
- For each domain generate mapping completeness statistics with the number of unmapped codes and and unmapped records
- For each domain extract the top 25 mapped and unmapped codes (counts are round up to the nearest 100)
- Extract the number of records in all vocabulary tables
- Count of concepts per vocabulary by standard, classification and non-standard
- Mapping levels of drugs (Clinical Drug etc.)
- Extracts the source_to_concept map
- Extract the timings of the Achilles queries (Achilles results need to be present in the database)
- Checks on the number of CPUs, memory available in R
- Extract the versions of all installed R packages, checks if core HADES packages are installed
- Check if ATLAS is installed and WebAPI is running
- Overview of number of passed/failed checks
-
Summary for set of 11 ingredients:
Concept ID Drug name (ATLAS) 1125315 acetaminophen 1139042 acetylcysteine 1703687 acyclovir 1119119 adalimumab 1154343 albuterol 528323 hepatitis B surface antigen vaccine 954688 latanoprost 968426 mesalamine 1550557 prednisolone 1140643 sumatriptan 40225722 ulipristal
Produces a word document in a DARWIN EU template that contains all the results and can be added as Annex 1 to the DARWIN-EU© Onboarding document.
The CdmOnboarding package is an R package.
Requires R. Some of the packages used by CdmOnboarding require Java.
-
See the instructions here for configuring your R environment, including Java.
-
Use the following commands to download and install CdmOnboarding:
remotes::install_github("DARWIN-EU/CdmOnboarding")
Performing the checks and exporting the CdmOnboarding results is done by executing the cdmOnboarding(...)
function.
Ideally, run the CdmOnboarding package on the same machine you will perform actual analyses so we can test its performance.
Make sure that Achilles has run in the results schema you select when calling the cdmOnboarding
function.
Ideally, all Achilles analyses are run before running CdmOnboarding.
However, the following Achilles analyses are required for CdmOnboarding to create a complete report:
analysisIds <- c(0, 105, 110, 111, 113, 117, 213, 220, 420, 502, 620, 720, 820, 920, 1020, 1820, 2102, 2120, 203, 403, 603, 703, 803, 903, 920, 1003, 1020, 1313, 1320, 1411, 1803, 1820, 213, 1313)
For a template execution script, see extras/CodeToRun.R.
PDF versions of the documentation are available:
This package is maintained by the Darwin EU Coordination Centre as part of its quality control procedures. We use the GitHub issue tracker for all bugs/issues/enhancements/questions/feedback Additions are welcome through pull requests. We suggest to first create an issue and discuss with the maintainer before implementing additional functionality.
CdmOnboarding is licensed under Apache License 2.0
CdmOnboarding is being developed in R Studio.
- The package is build upon the CdmInspection R package used and developed by The European Health Data & Evidence Network has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No 806968. The JU receives support from the European Union’s Horizon 2020 research
- We also like to thank the contributors of the OHDSI community for their fantastic work