Fixed seasonality, torch.compile and added "resume training" #87

victorbjorsvik · 2024-11-16T11:47:44Z

victorbjorsvik
Nov 16, 2024

For anyone intrested I looked through the Errata and several of the PRs of this repo. I implemented most of these suggestions in my own repo and trained the model for 1 Epoch. You can find the repo here. Here are the most prudent changes I made:

Adressed several of the issues Andrej had throughout the video including:

Seasonality in training data
Unable to use torch.compile while doing HellaSwag evals in the training loop
More aggressive learning rate and schema
Enabled resuming training based on last model checkpoint
All the issues mentioned in Andrej's Errata in his repo
Several other improvements suggested in PRs in Andrej's repo
During training i managed to get a dt of ~ 0.34 per step and processed ~ 1.5M tokens per second. I trained the model for 1 epoch (~10B tokens) in under 2 hrs on 8 A100 (80 GB SXM4) GPUs. This gave me a min training loss of 2.84857, min validation loss of 3.0383 and a max HellaSwag eval of 0.3101. Model checkpoints can be provided upon request

All credit goes to the good people suggesting edits in the PRs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed seasonality, torch.compile and added "resume training" #87

{{title}}

Replies: 0 comments

Select a reply

Fixed seasonality, torch.compile and added "resume training" #87

victorbjorsvik Nov 16, 2024

Replies: 0 comments

victorbjorsvik
Nov 16, 2024