Skip to content

Commit

Permalink
[v0.2.0] Merge branch 'preview'
Browse files Browse the repository at this point in the history
  • Loading branch information
kyo-takano committed Jan 27, 2025
2 parents 3db6ab5 + cf858ea commit 6e34bdd
Show file tree
Hide file tree
Showing 27 changed files with 1,617 additions and 340 deletions.
158 changes: 121 additions & 37 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
# `chinchilla`

`chinchilla` is a research toolkit designed to estimate scaling laws and train compute-optimal models for various deep learning tasks.
![Parametric fit on LLM training runs](docs/imgs/parametric_fit.png)

`chinchilla` is a research toolkit designed to estimate scaling laws & train compute-optimal models for various deep learning tasks.

## Features

- **Scaling Law Estimation**: Fit a loss predictor based on multiple training runs.
- **Compute-Optimal Allocation**: Train the best possible model within a given compute budget.
- **Progressive Scaling**: Iteratively update the scaling law estimation and scale up the compute.
- **Simulation Mode**: Test scaling law estimations in hypothetical scenarios.

<table>
<tr>
Expand All @@ -11,7 +20,6 @@
</td>
<td>

- Researching the neural scaling law itself
- Scaling compute for
- Large Language Models (LLM)
- Vision Transformers (ViT)
Expand All @@ -20,76 +28,142 @@
- Knowledge distillation
- Evaluating compute efficiencies of new algorithms & architectures
- Researching the neural scaling law itself

</td>
<tr>
<td>

**Probably Not For**:
Probably **NOT** For...
</td>
<td>

- Fine-tuning tasks
- Data-scarce domains
- etc.

</td>

</tr>
</table>

> [!IMPORTANT]
> This work builds upon the scaling law formulation proposed in [the original Chinchilla paper](https://deepmind.google/discover/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training/) by DeepMind (2022),
> with some modifications detailed in [./docs/changes.md](https://github.com/kyo-takano/chinchilla/tree/master/docs/changes.md).
## Features
## Installation

- **Scaling Law Estimation**: Fit a loss predictor based on multiple training runs.
- **Compute-Optimal Allocation**: Train the best possible model within a given compute budget.
- **Progressive Scaling**: Iteratively update the scaling law estimation and scale up the compute.
- **Simulation Mode**: Test scaling law estimations in hypothetical scenarios.
**From PyPI**

## Basics
```bash
pip install -U chinchilla
```

### Definitions
**From Source**

```bash
git clone https://github.com/kyo-takano/chinchilla.git
cd chinchilla
pip install -e .
```

## Prerequisite: Chinchilla formulation

Just in case you are not familiar, here is the formulation of the scaling law estimation:

<details>

<summary style="font-weight: bold;">Variables</summary>

- $N$: The number of parameters
- $D$: The number of data samples
- $C$: Total compute in FLOPs ($C\approx 6\ ND$)
- $L(N,\ D) = E + A/N ^ \alpha + B / D ^ \beta$: A loss predictor parameterized by $\{E, A, B, \alpha\}$ and $\beta$
- $L(N,\ D) = E + A / N ^ \alpha + B / D ^ \beta$: A loss predictor parameterized by $\{E, A, B, \alpha\}$ and $\beta$

---

**Intuition**:
- $E$ corresponds to the **irreducible loss** that can only be atained with an ideal model with infinite compute
- $A / N ^ \alpha$ accconts for the additional loss coming from insufficiency of model size;
- $B / D ^ \beta$, insufficiency of data amount.

</details>

<details>

### Compute-Optimal Allocation
<summary style="font-weight: bold;">Objective</summary>

1. Optimize the parameters $\{E, A, B, \alpha, \beta\}$ to better predict losses $L_i$ from $(N_i, D_i)$
2. Solve $\underset{N,\ D}{argmin}\ L(N,\ D\ |\ C)$, which can be derived from $\{A, B, \alpha, \beta\}$

### `chinchilla` Procedure
</details>

- `seed`: Sample X training runs $(N_i, D_i, L_i)$, referred to as **seeds**
- For i = 0 to K:
- `fit`: Optimize the scaling law parameters to fit $L(N,\ D)$ on the training runs
- `scale`: Configure a new model with a **scaled** compute
- Evaluate the allocation by training a model
- `append`: Add the result to the database of training runs
## Usage

## Installation
### 1. Fitting the scaling law on existing dataset

> [!WARNING]
>
> `chinchilla` requires Python >= 3.8
> [!NOTE]
> An example of this usage can be found [here](https://github.com/kyo-takano/chinchilla/blob/master/examples/llm/main.ipynb)
**From Source** (Recommended for Customization)
First, prepare a CSV looking like this and save it as `df.csv`:

```bash
git clone https://github.com/kyo-takano/chinchilla.git
cd chinchilla
pip install -e .
```csv
C,N,D,loss
1.3972367362937152e+18,73824672,3154403320,3.405928
1.7656304230443515e+18,89818214,3276303602,3.325255
2.0558971596900728e+18,105811837,3238291053,3.300442
...
```

**From PyPI**
Second, define a grid of initial parameters to fit like:

```bash
pip install -U chinchilla
```python
import numpy as np
from chinchilla import Chinchilla
cc = Chinchilla(
"./", # Assuming `df.csv` is under ./
param_grid=dict(
E=np.linspace(1, 2, 5),
a=np.linspace(1, 10, 5), # a: log(A)
b=np.linspace(1, 10, 5), # b: log(B)
alpha=np.linspace(0.1, 0.7, 5),
beta=np.linspace(0.1, 0.7, 5),
),
)
```

## Usage
Finally, call `cc.fit()` & you'll get the parameters fit on your dataset, which you can easily access as `cc.params`

```python
>>> cc.fit()
>>> cc.params
{'E': 1.7004437920205586,
'A': 185.388090185727,
'B': 1627.0012474587165,
'alpha': 0.28923265350161337,
'beta': 0.3556020928031086}
```

By calling `cc.scale` with FLOPs specified like

```python
cc.allocate_compute(C=1e24)
```

You can get an estimatedly compute-optimal allocation of compute to $N$ and $D$.

### 2. Scaling from scratch

> [!NOTE]
> An example of this usage can be found [here](https://github.com/kyo-takano/chinchilla/blob/master/examples/efficientcube.ipynb)
> **Procedure**:
>
> - `seed`: Sample X training runs $(N_i, D_i, L_i)$, referred to as **seeds**
> - For i = 0 to K:
> - `fit`: Optimize the scaling law parameters to fit $L(N,\ D)$ on the training runs
> - `scale`: Configure a new model with a **scaled** compute
> - Evaluate the allocation by training a model
> - `append`: Add the result to the database of training runs
Below is an example to get started with `chinchilla`.

Expand Down Expand Up @@ -143,7 +217,9 @@ Ensure you define functionally equivalent versions of:
- `YourModelClass`: Your model class definition.
- `train_and_evaluate`: Function to train and evaluate your model.

## Simulation
<details>

<summary style="font-size: 1.5rem; font-weight: bold;"> Simulation Mode</summary>

You can also visualize how `chinchilla` would perform under the given setup and a hypothetical scaling law, optionally with a **_noise term_**:

Expand All @@ -166,17 +242,25 @@ cc.simulate(
)
```

Please see [API Reference](https://github.com/kyo-takano/chinchilla/tree/master/docs/api-reference.md) for more.
</details>

## Examples

Find a practical application of `chinchilla` in the [`examples`](https://github.com/kyo-takano/chinchilla/tree/master/examples) directory (more to come):
Find practical applications/examples of `chinchilla` in the [`examples`](https://github.com/kyo-takano/chinchilla/tree/master/examples) directory (more to come):

- [Training Compute-Optimal Rubik's Cube Solvers](https://github.com/kyo-takano/chinchilla/blob/master/examples/efficientcube.ipynb) (100 PetaFLOPs)
- [Allocating $10^{24}$ FLOPs to a single LLM](https://github.com/kyo-takano/chinchilla/blob/master/examples/llm) [NEW]

- [Scaling Rubik's Cube Solvers from Scratch](https://github.com/kyo-takano/chinchilla/blob/master/examples/efficientcube.ipynb)

## Documentation

For a detailed API Reference, tips, differences from the original Chinchilla paper, etc., please browse to [./docs](https://github.com/kyo-takano/chinchilla/tree/master/docs).
- [API Reference](https://github.com/kyo-takano/chinchilla/tree/master/docs/api-reference.md)

- [Tips](https://github.com/kyo-takano/chinchilla/tree/master/docs/TIPS.md)

- [Math](https://github.com/kyo-takano/chinchilla/tree/master/docs/math.md)

- [Differences from the original Chinchilla](https://github.com/kyo-takano/chinchilla/tree/master/docs/changes.md)

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion chinchilla/_logger.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Contains a utility function `get_logger`. This module also filters out noisy debug messages
Contains a utility function `get_logger`. This module also filters out noisy debug messages
from `matplotlib` and suppresses redundant warnings from `numpy` and `matplotlib`.
"""

Expand Down
1 change: 1 addition & 0 deletions chinchilla/_metrics.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""A few loss & weight functions you can use on demand."""

from __future__ import annotations # PEP 604 backport

import numpy as np
Expand Down
1 change: 1 addition & 0 deletions chinchilla/_utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Utility functions."""

from __future__ import annotations # PEP 604 backport

import itertools
Expand Down
Loading

0 comments on commit 6e34bdd

Please sign in to comment.