This is a project for the CUHK ESTR2018/ENGG2760A course.
A peek into word embeddings using word2vec
- To explore the use of probability in word2vec models.
- To find out the relationship between the conditional probability of a word appear given other words and similarity of words.
- Install the required libraries.
- Install NumPy:
pip install numpy
- Install Gensim:
pip install gensim
- Install NLTK:
pip install nltk
- Install scikit-learn:
pip install scikit-learn
- Install Matplotlib:
pip install matplotlib
- Clone this GitHub repository or directly download the files.
git clone https://github.com/yueagar/ESTR2018-project.git
- Modify and run the scripts.
- Testing the pre-trained Google News Word2Vec model:
- Download the model and modify modelPath in the script to load it properly.
- Run the script:
python word2vec-google-news-pre-trained.py
- Training a Skip-Gram model:
- Modify the filename of the train data and the target word for testing.
- Run the script:
python word2vec-sg.py
- Google Code - Word2Vec: https://code.google.com/archive/p/word2vec/
- Geeks4Geeks - Implement your own word2vec(skip-gram) model in Python: https://www.geeksforgeeks.org/implement-your-own-word2vecskip-gram-model-in-python/
- Proposal
- Project subject, description and activities
- Presentation powerpoint slides
- Brief introduction to word embeddings and word2vec
- Probability in word2vec models
- Demonstration of the code implementation
- Code implementation
- Use of the pre-trained Google News word2vec model
- Training of a Skip-Gram model
- Final report
- Draft
- Final LaTeX or Word file