Large Language Model using Tensorflow

This project aims to develop a Large Language Model (LLM) using Tensorflow, Keras, NumPy, Pandas, Scikit-learn and NLTK. The LLM model was trained on a large English corpus and the data was cleaned and preprocessed in batches to handle large data.

Features

Trains LLM model on large English corpus
Cleans and preprocesses data in batches to handle large data
Loads and trains data in batches
Visualize training progress and word embedding in Tensorboard
Auto saves cleaned data, preprocessed data and trained models
Resumes training from where it left off

Usage

To run the project, you need to install the following libraries:

Tensorflow
Keras
NumPy
Pandas
Scikit-learn
NLTK

Then, run the 'main.ipynb' file to run this project.