MedSpecSearch-v2

Download required data-embeddings-models from https://drive.google.com/drive/folders/1RHj-AnlXyEIKRGa7WmNg1eE_8ezsGnvl?usp=sharing.

Some Embeddings (specified in txt file in zip that you will download from the link) needed to be downloaded seperately, such as GoogleNews Word2Vec Embeddings.

Tutorial Jupyter Notebooks are explained below.

All methods are documented.

Scripts

Below are list of ipynb tutorial files that show how to use scripts.

iCliniq Data Scraping.ipynb

This notebook shows how to use icliniq_data_scraper module to collect data from icliniq website. Downloaded data will be saved to hard drive.

Shows ssage of:

icliniq_data_scraper.py

Embedding Training.ipynb

This notebook shows how to train and save Word2Vec and FastText embeddings.

Uses:

Gensim
Pandas

Data process example with iCliniq.ipynb

This notebook shows how to process data from csv files so that we can feed it into neural network while training.

Uses :

DataLoader

Training and Saving Model.ipynb

This notebook shows how to train and save a tensoflow model into hard drive.

Uses:

helper.py
EmbedHelper.py
Models.py

Restoring Model.ipynb

This notebook shows how to restore a pre-trained model from hard drive.

Uses:

helper.py
EmbedHelper.py
Models.py

Getting Predictions.ipynb

This notebook shows how to get predictions from a trained model.

Uses:

helper.py
EmbedHelper.py
DataLoader.py
Models.py

Hospital Data.ipynb

Shows how to use hospitals.py script to get hospital results for specified medical specialty and region.

Shows usage of:

hospitals.py

Phrases.ipynb

Shows how to use newsgroups or iCliniq data (named fold0 in script) to find frequently used phrases in data. Also shows training and saving embeddings of Phrase-replaced data.

Shows usage of:

tfidf_mesh.py

Preprocessing Input and Text Translation.ipynb

This notebook shows usage of Text Preprocessing and Text translation (From any language to English, input language is automatically deduced, Google doesnt ask extra money for automatic language detection)

Uses:

DataLoader.py

Shows usage of:

DataHandler

An authentication token for Google Translation API is required to use Turkish translation in this project. Steps below explains how to get a token and use it.

How to get aut.json file for Google API

Login to https://console.developers.google.com
Click "Credentials" on the left side under "APIs & Services".
If there is no previously created project available, create a new project.
Click "Create Credentials" and select "Service account key".
Select Service Account, choose JSON format as key type and click Create.
Rename downloaded file as "aut.json" and put it under data folder of this project.

Config File properties

configs = {
    "vectorSize":300,
    "trainNewModel":True,
    "dataColumn":"question",
    "maxLength":128,
    "batchSize":64,
    "embeddingType":embedDict[2],
    "PreEmbed":True,
    "restore":True,
    "model_type":"CNN_3Layer" # Options are : "CNN" (1 layer) , "CNN_3Layer", "RNN_LSTM"
}

vectorSize: Dimension size of embedding vectors. We have used for our embeddings.
trainNewModel: Specifies if a new model should be trained or not.
dataColumn: Specifies which column should be used as data in csv files. Can be different for each csv file.
maxLength: Maximum sentence length in words. Data instances with more than maxLength will be cut to 128 words.
batchSize: Batch size during training.
embeddingType: Specifies which embedding should be used.
PreEmbed: Specifies if loaded embeddings should be used or not. Should be "True" in most cases.
restore: Specifies if model should be restored

Some of these specifications are not currently used but kept for backwards compatability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedSpecSearch-v2

Scripts

iCliniq Data Scraping.ipynb

Embedding Training.ipynb

Data process example with iCliniq.ipynb

Training and Saving Model.ipynb

Restoring Model.ipynb

Getting Predictions.ipynb

Hospital Data.ipynb

Phrases.ipynb

Preprocessing Input and Text Translation.ipynb

How to get aut.json file for Google API

Config File properties

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Embeddings		Embeddings
NNModels		NNModels
data		data
Data process example with iCliniq.ipynb		Data process example with iCliniq.ipynb
DataLoader.py		DataLoader.py
EmbedHelper.py		EmbedHelper.py
Embedding Training.ipynb		Embedding Training.ipynb
Getting Predictions from Model.ipynb		Getting Predictions from Model.ipynb
Hospital Data.ipynb		Hospital Data.ipynb
Models.py		Models.py
Phrases.ipynb		Phrases.ipynb
Preprocessing Input and Text Translation.ipynb		Preprocessing Input and Text Translation.ipynb
README.md		README.md
Restoring Model.ipynb		Restoring Model.ipynb
TF-IDF and MESH.ipynb		TF-IDF and MESH.ipynb
Training and Saving Model.ipynb		Training and Saving Model.ipynb
fold0classDict.pkl		fold0classDict.pkl
helper.py		helper.py
hospitals.py		hospitals.py
iCliniq Data Scraping.ipynb		iCliniq Data Scraping.ipynb
icliniq_data_crawler.py		icliniq_data_crawler.py
keyword_recommender.py		keyword_recommender.py
mesh.txt		mesh.txt
newsgroups.py		newsgroups.py
phrases.txt		phrases.txt
remover.py		remover.py
tfidf_mesh.py		tfidf_mesh.py
utility.py		utility.py

ulucsahin/MedSpecSearch-v2

Folders and files

Latest commit

History

Repository files navigation

MedSpecSearch-v2

Scripts

iCliniq Data Scraping.ipynb

Embedding Training.ipynb

Data process example with iCliniq.ipynb

Training and Saving Model.ipynb

Restoring Model.ipynb

Getting Predictions.ipynb

Hospital Data.ipynb

Phrases.ipynb

Preprocessing Input and Text Translation.ipynb

How to get aut.json file for Google API

Config File properties

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages