You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want ASDL to support distributed training with DistributedDataParallel but it cannot process DDP model now.
model = resetnet50();
kfac = asdl.KFAC(model, 'fisher_emp'); # must create KFAC before wrapping
model = DistributedDataParallel(model, device_ids=[args.gpu])
DDP will register sync hook for every parameter to reduce all gradients when calling loss.backward(). But gradients are not syncing after kfac.accumulate_curvature.
criterion(model(inputs), targets).backward(); # gradients will be synced.
kfac.accumulate_curvature(inputs, targets, calc_emp_loss_grad=True); # gradients will not be synced
We don't know why it happens even if backward will be called in accumulate_curvature.
The text was updated successfully, but these errors were encountered:
loss.backward()
. But gradients are not syncing afterkfac.accumulate_curvature
.We don't know why it happens even if
backward
will be called inaccumulate_curvature
.The text was updated successfully, but these errors were encountered: