New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

calculate_fisher doesn't sync gradients for DDP model. #27

Open

Cecilwang opened this issue Mar 9, 2022 · 0 comments

Collaborator

Cecilwang commented Mar 9, 2022

We want ASDL to support distributed training with DistributedDataParallel but it cannot process DDP model now.

model = resetnet50();
kfac = asdl.KFAC(model, 'fisher_emp'); # must create KFAC before wrapping
model = DistributedDataParallel(model, device_ids=[args.gpu])

DDP will register sync hook for every parameter to reduce all gradients when calling loss.backward(). But gradients are not syncing after kfac.accumulate_curvature.

criterion(model(inputs), targets).backward(); # gradients will be synced.
kfac.accumulate_curvature(inputs, targets, calc_emp_loss_grad=True); # gradients will not be synced

We don't know why it happens even if backward will be called in accumulate_curvature.

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment