Skip to content

Latest commit

 

History

History
40 lines (28 loc) · 4.95 KB

README.md

File metadata and controls

40 lines (28 loc) · 4.95 KB

Some additional works on Edward Choi's medGAN

In this repository, I share my own work that is based on Edward Choi's medGAN. Congrats to Edward's excellent work!

medGAN (for medical GAN) is a generative adversarial network (GAN) for electronic health records (EHR). medGAN implements the algorithm introduced in the following paper:

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun  
Machine Learning for Healthcare (MLHC) 2017

I opened a few pull requests on Edward Choi's medGAN repository:

Quick preview

  • Author: Sylvain Combettes
  • Dates: June 24th – Sept. 13th, 2019 (3 months)
  • Context: As part of my penultimate-year at Mines Nancy, I did a 3-month research internship at Servier, the second largest pharmaceutical company in France. In 2018, Servier had a €4.2 billion turnover, operated in 149 countries and had more than 22,000 employees.
  • Topic: Generating fictitious realistic patient data in order to boost the prediction score [synthesis, dataset augmentation].
  • Method: Combining GANs (generative adversarial networks) with autoencoders [implicit density estimation].
  • Programming: Python.
  • Result: The prediction score can be increased by more than 5% on binary values.
  • Links: [5 pages synthetic report] [full 62 pages report] [slides]

Abstract

In the first chapter, we do a general presentation on GANs, in particular how they work. GANs are a revolutionary generative model invented by Ian Goodfellow in 2014. The key idea behind GANs is to have two neural networks competing against each other: the generator and the discriminator. GANs can synthesize samples that are impressively realistic.

In the second chapter, we apply GANs to patient data. The method is called medGAN (for medical GAN) and was developed by Edward Choi in 2018. medGAN can only synthesize binary or count values. There are two main applications of medGAN: privacy and dataset augmentation. We only focus on dataset augmentation from a real-life dataset: we generate fictitious yet realistic samples that can then be concatenated with the real-life dataset into an augmented dataset (that has more samples). Training a predictive model on the augmented dataset rather than on the real-life dataset can boost the prediction score (if the generated data is realistic enough).

How to use this repository

Note: Due to confidentiality reasons, the data is not available on my repository. If you wish to have access to the data, please refer to my report for the process.