-
Notifications
You must be signed in to change notification settings - Fork 13
Data Management Systems
This is a rough overview of data management systems in use by various groups at MSKCC. It is not a complete list, but feel free to add/edit/remove.
This is the dev team for MSKCC's Molecular Diagnostics Service. They provide CLIA approved MSK-IMPACT sequencing for eligible patients. Most dev work is done in Python. Interfaces to DBs use SQLAlchemy. By Q4 2015, these DBs will use a MySQL+MongoDB hybrid, to make it easier to implement some new use-cases.
DMS, Data Management System - Uses MySQL to track informatics pipeline runs, QC metrics, etc.
CVR, Clinical Variants Resource - Uses MySQL to manage lists of clinical variants, with a Flask frontend for manual curation and signing-out of reports. Has a REST API to securely send variant lists to cBioPortal, the electronic health records (EHR), and for efforts like the Knowledge Engine.
This is the team behind the popular cBioPortal suite for cancer genomics analysis and visualization, now also responsible for scaling up research and clinical interpretation.
cBioPortal - github.com/cBioPortal/cbioportal - They use a MySQL DB under the hood, that's now getting really big. So they are considering some DB caching methods. Most backend code is Java, and web frontend is a mix of Javascript libraries - mostly jquery and backbone, with a little bootstrap.
OncoKB - github.com/cBioPortal/oncokb - Uses Google Docs' Realtime API as the datastore for expert curated content on variant actionability. Per-patient reviewing and reporting or somatic variants is managed by a MySQL server.
These are some of the many ways in which clinical data is organized and managed at MSKCC. Excel sheets and MS Access databases are used by various DMTs (Disease Management Teams), but the following consolidated systems are in place.
IDB, Institutional DB - The center-wide database with complete EHRs per patient. Most data is in free-text as clinician's notes.
Darwin - provides de-identified views for researchers searching for patients eligible for a research study. Also provides the de-identified view to clinical data that cBioPortal serves up to internal researchers.
CRDB, Clinical Research Database - is for standardized data entry per patient, for outcomes tracking. Darwin grabs this data and de-identifies it for researchers across MSKCC.
Caisis - Was first adopted by the urology DMT, as the point of data entry, and is used by a few more DMTs. Serves up REDCap (Research Electronic Data Capture) forms for different clinical contexts. Makes it much easier for downstream tools like Darwin/cBioPortal to gather and serve up relevant patient info.