Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix for offload_states API #7050

Merged
merged 6 commits into from
Feb 21, 2025

Conversation

U-rara
Copy link
Contributor

@U-rara U-rara commented Feb 18, 2025

@fukun07 and I discovered a bug when using the offload_states and reload_states APIs of the Zero3 optimizer. When using grouped parameters (for example, in weight decay or grouped lr scenarios), the order of the parameters mapping in reload_states (here) does not correspond with the initialization of self.lp_param_buffer (here), which leads to misaligned parameter loading. This issue was overlooked by the corresponding unit tests (here), so we fixed the bug in our PR and added the corresponding unit tests.

bugfix for offload_states

Signed-off-by: Wei Wu <[email protected]>
@U-rara U-rara force-pushed the bugfix_reload_states branch from 00fa6a4 to 836b55b Compare February 18, 2025 16:04
Signed-off-by: Wei Wu <[email protected]>
@U-rara
Copy link
Contributor Author

U-rara commented Feb 20, 2025

@tjruwase Good suggestion! My previous fix seemed a bit hasty, so I optimized it according to your advice.

@U-rara U-rara requested a review from tjruwase February 20, 2025 15:30
@loadams loadams enabled auto-merge February 20, 2025 17:21
Copy link
Contributor

@tohtana tohtana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, thank you @U-rara!

@loadams loadams added this pull request to the merge queue Feb 20, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 21, 2025
@tjruwase tjruwase added this pull request to the merge queue Feb 21, 2025
Merged via the queue into deepspeedai:master with commit 38327e0 Feb 21, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants