Skip to content

Commit

Permalink
[minor] Fix minor presentation & utility issues
Browse files Browse the repository at this point in the history
  • Loading branch information
kyo-takano committed Jan 27, 2025
1 parent e1847b3 commit cf858ea
Show file tree
Hide file tree
Showing 8 changed files with 30 additions and 27 deletions.
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ pip install -e .

Just in case you are not familiar, here is the formulation of the scaling law estimation:

<!-- ### Definitions -->
<details>

<summary style="font-weight: bold;">Variables</summary>
Expand All @@ -85,14 +84,12 @@ Just in case you are not familiar, here is the formulation of the scaling law es
**Intuition**:
- $E$ corresponds to the **irreducible loss** that can only be atained with an ideal model with infinite compute
- $A / N ^ \alpha$ accconts for the additional loss coming from insufficiency of model size;
- $ B / D ^ \beta$, insufficiency of data amount.
- $B / D ^ \beta$, insufficiency of data amount.

</details>

<details>

<!-- ### Compute-Optimal Allocation -->

<summary style="font-weight: bold;">Objective</summary>

1. Optimize the parameters $\{E, A, B, \alpha, \beta\}$ to better predict losses $L_i$ from $(N_i, D_i)$
Expand All @@ -105,7 +102,7 @@ Just in case you are not familiar, here is the formulation of the scaling law es
### 1. Fitting the scaling law on existing dataset

> [!NOTE]
> An example of this usage can be found [here](examples/llm/)
> An example of this usage can be found [here](https://github.com/kyo-takano/chinchilla/blob/master/examples/llm/main.ipynb)
First, prepare a CSV looking like this and save it as `df.csv`:

Expand Down Expand Up @@ -157,7 +154,7 @@ You can get an estimatedly compute-optimal allocation of compute to $N$ and $D$.
### 2. Scaling from scratch

> [!NOTE]
> An example of this usage can be found [here](examples/llm)
> An example of this usage can be found [here](https://github.com/kyo-takano/chinchilla/blob/master/examples/efficientcube.ipynb)
> **Procedure**:
>
Expand Down
34 changes: 13 additions & 21 deletions docs/TIPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,28 +65,9 @@ The minima are smoother and more stable, allowing for easier convergence during
As a matter of fact, this technique is so effective that even a naive grid search can work almost as good as L-BFGS:
<div style="display: flex; justify-content: center; gap: 1.5rem; align-items: center; font-size: 1.5rem;">
<div>
<img src="./imgs/algorithm.init-original.png" alt="Original Algorithm">
</div>
➡️
<div>
<img src="./imgs/algorithm.init-improved.png" alt="Improved Algorithm">
</div>
</div>
## 2. Keep `scaling_factor` moderate
![Algorithms' performance by initialization quality](imgs/algorithm.comparison.png)
Scaling compute according to the loss predictor involves ***extrapolation*** beyond the FLOPs regime used for fitting the predictor.
To avoid overstepping, it's advisable to:
- **Incrementally scale compute** rather than making large jumps.
- ***Continuously update*** the scaling law as a new data point becomes available.
As a rule of thumb, I would suggest using`scaling_factor=2.0` as a good starting point.
This approach balances the compute budget by dedicating roughly half of it to scaling law estimation and the other half to final model training.
## 3. Beware of "failure modes"
## 2. Beware of "failure modes"
When fitting the loss predictor, several common failure modes may arise. These are often tied to poor configurations, including;
Expand All @@ -98,6 +79,17 @@ When fitting the loss predictor, several common failure modes may arise. These a
![Underfitting failure](imgs/optim--underfit.jpg)
## 3. Keep `scaling_factor` moderate
Scaling compute according to the loss predictor involves ***extrapolation*** beyond the FLOPs regime used for fitting the predictor.
To avoid overstepping, it's advisable to:
- **Incrementally scale compute** rather than making large jumps.
- ***Continuously update*** the scaling law as a new data point becomes available.
As a rule of thumb, I would suggest using`scaling_factor=2.0` as a good starting point.
This approach balances the compute budget by dedicating roughly half of it to scaling law estimation and the other half to final model training.
---
> [!NOTE]
Expand Down
Binary file added docs/imgs/algorithm.comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/imgs/algorithm.init-improved.png
Binary file not shown.
Binary file removed docs/imgs/algorithm.init-original.png
Binary file not shown.
14 changes: 14 additions & 0 deletions examples/llm/main.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
"id": "XT3xW5kr3dT2"
},
"source": [
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kyo-takano/chinchilla/blob/master/examples/llm/main.ipynb)\n",
"[![GitHub Repository](https://img.shields.io/badge/-chinchilla-2dba4e?logo=github)](https://github.com/kyo-takano/chinchilla)\n",
"\n",
"# Allocating $10^{24}$ FLOPs to a single LLM\n",
"\n",
"This notebook guides you through **estimating the scaling law for LLMs** (with `vocab_size=32000`) using a subset of Chinchilla training runs (filter: $10^{18} < C \\wedge N < D$).\n",
Expand All @@ -18,6 +21,17 @@
"- How the \"20 tokens per parameter\" heuristic compares"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"Uncomment these lines if not cloning\"\"\"\n",
"# %pip install -U chinchilla\n",
"# !wget -nc https://github.com/kyo-takano/chinchilla/raw/refs/heads/preview/examples/llm/df.csv"
]
},
{
"cell_type": "code",
"execution_count": 1,
Expand Down
Binary file removed examples/llm/simulation--optim.png
Binary file not shown.
Binary file removed examples/llm/simulation--parametric_fit.png
Binary file not shown.

0 comments on commit cf858ea

Please sign in to comment.