Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate_fisher doesn't sync gradients for DDP model. #27

Open
Cecilwang opened this issue Mar 9, 2022 · 0 comments
Open

calculate_fisher doesn't sync gradients for DDP model. #27

Cecilwang opened this issue Mar 9, 2022 · 0 comments

Comments

@Cecilwang
Copy link
Collaborator

  1. We want ASDL to support distributed training with DistributedDataParallel but it cannot process DDP model now.
model = resetnet50();
kfac = asdl.KFAC(model, 'fisher_emp'); # must create KFAC before wrapping
model = DistributedDataParallel(model, device_ids=[args.gpu])
  1. DDP will register sync hook for every parameter to reduce all gradients when calling loss.backward(). But gradients are not syncing after kfac.accumulate_curvature.
criterion(model(inputs), targets).backward(); # gradients will be synced.
kfac.accumulate_curvature(inputs, targets, calc_emp_loss_grad=True); # gradients will not be synced

We don't know why it happens even if backward will be called in accumulate_curvature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant