Gradients dropped by `adapt` #131

mcabbott · 2022-08-22T00:37:44Z

Moving y to "gpu" inside loss causes its gradient to be lost:

julia> using Tracker, JLArrays

julia> JLArrays.allowscalar(false)

julia> Tracker.withgradient((x,y) -> sum(x[1:2] + jl(y))^2, jl([1,2,3.0]), [4,5.0])
(val = 144.0, grad = ([24.0, 24.0, 0.0], [0.0, 0.0])) 

julia> ans.grad[1] isa JLArray
true

unlike Zygote:

julia> Zygote.withgradient((x,y) -> sum(x[1:2] + jl(y))^2, jl([1,2,3.0]), [4,5.0])
(val = 144.0, grad = ([24.0, 24.0, 0.0], [24.0, 24.0]))

ToucheSir · 2022-08-22T02:05:52Z

Riffing on a rule:

@grad function Adapt.adapt_storage(adaptor, x::AT) where {AT <: Array}
  adapt_storage_pullback(Δ) = (nothing, Adapt.adapt_storage(AT, Δ))
  return Adapt.adapt_storage(adaptor, x), adapt_storage_pullback
end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradients dropped by `adapt` #131

Gradients dropped by `adapt` #131

mcabbott commented Aug 22, 2022

ToucheSir commented Aug 22, 2022

Gradients dropped by adapt #131

Gradients dropped by adapt #131

Comments

mcabbott commented Aug 22, 2022

ToucheSir commented Aug 22, 2022

Gradients dropped by `adapt` #131

Gradients dropped by `adapt` #131