This project aims to develop a Large Language Model (LLM) using Tensorflow, Keras, NumPy, Pandas, Scikit-learn and NLTK. The LLM model was trained on a large English corpus and the data was cleaned and preprocessed in batches to handle large data.
- Trains LLM model on large English corpus
- Cleans and preprocesses data in batches to handle large data
- Loads and trains data in batches
- Visualize training progress and word embedding in Tensorboard
- Auto saves cleaned data, preprocessed data and trained models
- Resumes training from where it left off
To run the project, you need to install the following libraries:
- Tensorflow
- Keras
- NumPy
- Pandas
- Scikit-learn
- NLTK
Then, run the 'main.ipynb' file to run this project.