Update references to new GitHub org (deepspeedai) #461

loadams · 2025-02-07T17:33:55Z

No description provided.

This PR adds a Llama universal checkpointing example to examples_deepspeed/universal_checkpointing. It also includes changes to the README, some minor changes, and an update to the TensorBoard analysis script. Signed-off-by: Logan Adams <[email protected]>

…sing flash_attn_cuda in sequence parallel (#406) Co-authored-by: Jinghan Yao <[email protected]> Signed-off-by: Logan Adams <[email protected]>

Signed-off-by: Logan Adams <[email protected]>

…on for supporting batch size larger than 1 (#433) Co-authored-by: Jinghan Yao <[email protected]> Signed-off-by: Logan Adams <[email protected]>

* add support converting checkpoint from hf to mds * Fix PP issue * update Signed-off-by: Logan Adams <[email protected]>

Signed-off-by: Logan Adams <[email protected]>

* fix TFLOPs calculation when GQA used, we observe right TFLOPs after this fix. when GQA is not used, huge difference in TFLOPs is solved with selective recompute . some other minor difference will also be observed as logits macs also added. * add copyrights Signed-off-by: Logan Adams <[email protected]>

Signed-off-by: Logan Adams <[email protected]>

Co-authored-by: Logan Adams <[email protected]> Signed-off-by: Logan Adams <[email protected]>

Signed-off-by: Logan Adams <[email protected]>

…l divided the gradient (#428) Signed-off-by: Logan Adams <[email protected]>

Signed-off-by: Logan Adams <[email protected]>

* pass batch_dim_idx to deepspeed sequence parallel distributed attention for supporting batch size larger than 1 * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * add FPDT support; add Ulysses rotary position embedding support * remove unnecessary files * set the warmup length to be FPDT chunk size if enabled --------- Co-authored-by: Jinghan Yao <[email protected]> Co-authored-by: Jinghan Yao <[email protected]> Signed-off-by: Logan Adams <[email protected]>

* [tools]GQA convert support * fix readme Signed-off-by: Logan Adams <[email protected]>

Previously, `deepspeed_to_megatron.py` would raise an import error due to the relative import. This commit fixes this issue by changing from the relative import to the absolute import like in `deepspeed_to_transformers.py`. Signed-off-by: Logan Adams <[email protected]>

Signed-off-by: Logan Adams <[email protected]>

loadams requested review from jeffra, tjruwase and GuanhuaWang as code owners February 7, 2025 17:33

lekurile and others added 21 commits February 7, 2025 09:34

fixing the bug of flash_attn import and the wrong gather index when u…

2d3e970

…sing flash_attn_cuda in sequence parallel (#406) Co-authored-by: Jinghan Yao <[email protected]> Signed-off-by: Logan Adams <[email protected]>

add fused_rms_norm support on XPU device (#431)

ba95c75

Signed-off-by: Logan Adams <[email protected]>

pass batch_dim_idx to deepspeed sequence parallel distributed attenti…

ea296df

…on for supporting batch size larger than 1 (#433) Co-authored-by: Jinghan Yao <[email protected]> Signed-off-by: Logan Adams <[email protected]>

[LLaMa] Adding support converting checkpoint from mds to hf (#432)

270e275

* add support converting checkpoint from hf to mds * Fix PP issue * update Signed-off-by: Logan Adams <[email protected]>

add device check when import ipex (#436)

54125d2

Signed-off-by: Logan Adams <[email protected]>

fix nan issue when running megatron-deepspeed (#434)

990106b

Signed-off-by: Logan Adams <[email protected]>

enable empty cache on XPU device (#438)

52eede5

Signed-off-by: Logan Adams <[email protected]>

[wandb] disable wandb more gracefully (#422)

b3e5c39

Co-authored-by: Logan Adams <[email protected]> Signed-off-by: Logan Adams <[email protected]>

[Bug] Fix crash when logging optimizer state to tb (#417)

3e3ac63

Signed-off-by: Logan Adams <[email protected]>

Enable Sequence Parallelism (#429)

c124896

Signed-off-by: Logan Adams <[email protected]>

grad_wei can't be NoneType when running with DeepSpeed, for zero3 wil…

d6ccdae

…l divided the gradient (#428) Signed-off-by: Logan Adams <[email protected]>

fix init issue for rms_norm in squence_parallel (#448)

3a05011

Signed-off-by: Logan Adams <[email protected]>

enable profiler for specific ranks (#451)

acb2ab2

Signed-off-by: Logan Adams <[email protected]>

fix init issue for silently ignoring the deepspeed config (#452)

5afa02c

Signed-off-by: Logan Adams <[email protected]>

fix moe tflops (#445)

5efe2fc

Signed-off-by: Logan Adams <[email protected]>

[tool]GQA convert support (#454)

50ec44d

* [tools]GQA convert support * fix readme Signed-off-by: Logan Adams <[email protected]>

Update GH org

6f50508

Signed-off-by: Logan Adams <[email protected]>

loadams force-pushed the loadams/update-gh-org branch from 73a7b40 to 6f50508 Compare February 7, 2025 17:36

loadams closed this Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update references to new GitHub org (deepspeedai) #461

Update references to new GitHub org (deepspeedai) #461

loadams commented Feb 7, 2025

Update references to new GitHub org (deepspeedai) #461

Update references to new GitHub org (deepspeedai) #461

Conversation

loadams commented Feb 7, 2025