Skip to content

ulucsahin/MedSpecSearch-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MedSpecSearch-v2

Download required data-embeddings-models from https://drive.google.com/drive/folders/1RHj-AnlXyEIKRGa7WmNg1eE_8ezsGnvl?usp=sharing.

Some Embeddings (specified in txt file in zip that you will download from the link) needed to be downloaded seperately, such as GoogleNews Word2Vec Embeddings.

Tutorial Jupyter Notebooks are explained below.

All methods are documented.

Scripts

Below are list of ipynb tutorial files that show how to use scripts.

iCliniq Data Scraping.ipynb

This notebook shows how to use icliniq_data_scraper module to collect data from icliniq website. Downloaded data will be saved to hard drive.

Shows ssage of:

  • icliniq_data_scraper.py


Embedding Training.ipynb

This notebook shows how to train and save Word2Vec and FastText embeddings.

Uses:

  • Gensim
  • Pandas

Data process example with iCliniq.ipynb

This notebook shows how to process data from csv files so that we can feed it into neural network while training.

Uses :

  • DataLoader

Training and Saving Model.ipynb

This notebook shows how to train and save a tensoflow model into hard drive.

Uses:

  • helper.py
  • EmbedHelper.py
  • Models.py

Restoring Model.ipynb

This notebook shows how to restore a pre-trained model from hard drive.

Uses:

  • helper.py
  • EmbedHelper.py
  • Models.py

Getting Predictions.ipynb

This notebook shows how to get predictions from a trained model.

Uses:

  • helper.py
  • EmbedHelper.py
  • DataLoader.py
  • Models.py

Hospital Data.ipynb

Shows how to use hospitals.py script to get hospital results for specified medical specialty and region.

Shows usage of:

  • hospitals.py

Phrases.ipynb

Shows how to use newsgroups or iCliniq data (named fold0 in script) to find frequently used phrases in data. Also shows training and saving embeddings of Phrase-replaced data.

Shows usage of:

  • tfidf_mesh.py

Preprocessing Input and Text Translation.ipynb

This notebook shows usage of Text Preprocessing and Text translation (From any language to English, input language is automatically deduced, Google doesnt ask extra money for automatic language detection)

Uses:

  • DataLoader.py

Shows usage of:

  • DataHandler

An authentication token for Google Translation API is required to use Turkish translation in this project. Steps below explains how to get a token and use it.

How to get aut.json file for Google API

  • Login to https://console.developers.google.com
  • Click "Credentials" on the left side under "APIs & Services".
  • If there is no previously created project available, create a new project.
  • Click "Create Credentials" and select "Service account key".
  • Select Service Account, choose JSON format as key type and click Create.
  • Rename downloaded file as "aut.json" and put it under data folder of this project.

Config File properties

configs = {
    "vectorSize":300,
    "trainNewModel":True,
    "dataColumn":"question",
    "maxLength":128,
    "batchSize":64,
    "embeddingType":embedDict[2],
    "PreEmbed":True,
    "restore":True,
    "model_type":"CNN_3Layer" # Options are : "CNN" (1 layer) , "CNN_3Layer", "RNN_LSTM"
}
  • vectorSize: Dimension size of embedding vectors. We have used for our embeddings.
  • trainNewModel: Specifies if a new model should be trained or not.
  • dataColumn: Specifies which column should be used as data in csv files. Can be different for each csv file.
  • maxLength: Maximum sentence length in words. Data instances with more than maxLength will be cut to 128 words.
  • batchSize: Batch size during training.
  • embeddingType: Specifies which embedding should be used.
  • PreEmbed: Specifies if loaded embeddings should be used or not. Should be "True" in most cases.
  • restore: Specifies if model should be restored

Some of these specifications are not currently used but kept for backwards compatability.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published