Skip to content

Latest commit

 

History

History
119 lines (92 loc) · 2.71 KB

README.md

File metadata and controls

119 lines (92 loc) · 2.71 KB

Setup

Environment

git clone https://github.com/automl/venv_templates.git
cd venv_templates
# git checkout meta OR git checkout juwels
bash setup.sh my_scaling ../envs/
source activate.sh my_scaling ../envs/

Supporting repos

NePS:

# from base directory
pip install https://github.com/automl/neps.git
cd neps/
pip install -e .

Lightning-GPT:

# from base directory
pip install https://github.com/Lightning-AI/litgpt.git
cd litgpt/
pip install -e .

Install this repo

Setup Poetry:

# installs locally to system and not to an environment
curl -sSL https://install.python-poetry.org | python3.10 -

Install this repo from the base directory:

poetry install

Use pre-commit maximally

pre-commit run --all-files



TODO: Update this

scales-n-arpeggios

My tentative folder structure:

├── neps (built from source with .toml changes)
├── litgpt (built from source, with changes)
├── lm-evaluation-harness (built from source)
├── scales-n-arpeggios
│   ├── scales
│   │   ├── ...
│   ├── tests
│   │   ├── ...
│   ├── conda-env.yml

Installation

This will install necessary pre-commit and other maintenance tools (some subset of these tools)

git clone https://github.com/automl/scales-n-arpeggios.git
cd scales-n-arpeggios
conda create -n scales python=3.11
conda activate scales

# Install for development
make install-dev

Then install neps and litgpt for development

Note for PyCharm

When building the dependencies (litgpt, lm-evaluation-harness), use:

pip install -e ... --config-settings editable_mode=compat

instead of:

pip install -e ...

Refer here

Tasks (in no particular order)

  • Parametrize dataloader module (Start with TinyLlama)
  • Add support for different data modules
  • Add dataset_size parameter to the dataloaders
  • Auto-Assign "optimal" Dataset size based on model size
  • Fix Tokenizer
  • Add Evaluation interface with lm-evaluation-harness (through litgpt evaluate)
  • Test with a simple run on a cluster
  • Add Yaml config interface for neps.api.run calls
  • CLI for easily running scripts with yaml configs
  • Add pre-train <---> neps interface
  • Return Validation perplexity from litgpt for neps

Minimal Example

# Your code here