-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: add a supertype to layers #2028
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this (after arguing for macro in #2044 😄). One comment is it might be helpful to put the subtype declarations in the docstring for each function, e.g.
"""
Chain(layers...) <: ContainerLayer
and to document the abstract types in https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/layers.md.
:((NamedTuple{$F}(($(args...),)), $recon)) | ||
else | ||
# Getting this parameterless type takes about 2μs, every time: | ||
namedtuple(x), Base.splat(Base.typename(T).wrapper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one option here is to use ConstructionBase.constructorof
so that this can be customized in problematic cases (if there are any with layers... not really sure since the discussion in JuliaLang/julia#46213 is kinda hard to follow). But I guess we can't do that in the generated path, and we don't want diverging behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My argument against ConstructionBase is this:
-
At present the path for "easy" structs is
@functor MyStruct
, which makes assumptions about the default constructor. -
If you have a weird struct, you must write a method for
functor
which understands it. But this is discouraged.
With the supertype, you should automatically get the same behaviour as @functor MyStruct
.
But Functors is still handling recursion and re-building, and you can still supply a more specific method for functor
. It is in this case that a the supertype method using ConstructionBase would provide an alternative way to customise behaviour. But now you have to know about two different recurse-and-rebuild libraries, instead of one. And because we own the supertype, it's fair to assume that you are writing this for Flux, it isn't some large pre-existing thing which already has ConstructionBase methods written... I mean that's fine, you can still use Flux, but not via the supertype.
(I would actually probably favour ditching Functors, and simply demanding that all Flux models use structs with default constructors. Unlike Zygote it has no ambition to apply to arbitrary code. But that's for another day, this PR adds a friendly path without changing how other paths work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My argument is for ConstructionBase, but not in Flux. Rather, Functors should use it instead of rolling its own thing. The fallback behaviour will be identical (call T
), but it's more likely a library will need to depend on ConstructionBase anyhow for compat with Accessors etc. instead of on Functors (which is why we see piracy).
That or nix Functors as Michael said, but then the argument is even stronger IMO. You can still get away with a default constructor, but if you want something fancy there is but one game left in town.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes to putting the description of these into the docs.
I'm not sure whether they should be in layer docstrings. IMO these should aim at people calling the layer, for which it doesn't matter at all. Whereas to write a new one you should first do @less Dense(1 => 1)
.
Or look at the right bit of the manual... I think we had some examples of defining new structs, and probably those should also grow to mention supertypes.
abstract type PartialTrainLayer{which} <: SimpleLayer end | ||
|
||
function Optimisers.trainable(layer::PartialTrainLayer{which}) where {which} | ||
NamedTuple{which}(map(sy -> getfield(layer, sy), which)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type is the odd one out, I'm not entirely sure we should have it.
One, if we wish to have other traits which identify some subset of fields, the way trainable
does, then if they are independent they cannot share the same type hierarchy, and so are a bit second-class. Maybe that's ok, trainability is pretty important.
Two, it doesn't cover all existing cases of trainable
, e.g. we want mutable struct Recur{T,S} <: ContainerLayer
for printing, but also have trainable(a::Recur) = (; cell = a.cell)
.
struct RNNCell{F,A,V,S} | ||
struct RNNCell{F,A,V,S} <: SimpleLayer # or should it be PartialTrainLayer{(:Wi, :Wh, :b)}? | ||
σ::F | ||
Wi::A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm also not sure. Should state0
be trainable?
@functor BatchNorm | ||
trainable(bn::BatchNorm) = hasaffine(bn) ? (β = bn.β, γ = bn.γ) : (;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here PartialTrainLayer{(:β, :γ)}
fixes which fields are trainable, permanently. That's OK, since β = nothing
when it's not trainable, so it'll be ignored. It improves type-stability although I doubt this matters at all.
NamedTuple{F}(map(sy -> getfield(x, sy), F)) | ||
end | ||
|
||
Adapt.adapt_structure(to, layer::AbstractLayer) = fmap(x -> adapt(to, x), layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be just this
Adapt.adapt_structure(to, layer::AbstractLayer) = fmap(x -> adapt(to, x), layer) | |
Adapt.adapt_structure(to, layer::AbstractLayer) = fmap(adapt(to), layer) |
but check the version bound on Adapt
Since this has now been put on the next milestone, I should chime in. I personally do not like having a super-type. My reasons:
It seems like the main motivation for this PR originally (though it has grown) is to make
|
I agree the present state of this PR is a sort-of "maximial" look at what's possible, to see what clashes result, and as noted I'm not sure that including Figuring out whether things clash with packages is one goal here. It looks at first glance that InvertibleNetworks.jl has a type hierarchy which could subtype Flux.AbstractLayer, but perhaps no finer, not sure, anyone know more details? Mill.jl I'm less sure -- anyone know? Note that both packages mentioned do load Flux. I also agree that some kind of Unlike a supertype, the new macro could be obligatory. The idea of translating Macros are always a bit opaque. You could argue that FluxML has probably gone a little too far in avoiding types, e.g. (Don't read too much into the milestone, just trying to separate future ideas from bugfixes a bit.) |
I don’t think a macro that defines a show method should be obligatory because then you will get method overwrites if you define your own show method (which is one nice thing about the abstract type- it’s a fallback then). |
Good point. The messy thing about macros will be the order in which stuff is called. Something like struct MyLayer
foo
end
Base.show(..., ::MyLayer) = ...
@layer MyLayer seems like a simple error to run into. The current I think with something like FluxML/Functors.jl#41, we remove
I was going to prototype the Functors issue anyways, so I can also test out the Another option is to make Alternatively, introducing the hierarchy with just |
Yes the macro needs at least a way to choose Chain-like vs. Dense-like printing. It it not only methods for Macros which act on structs seem a step more opaque. And also more complicated to write, e.g. how should it interact with |
@@ -55,16 +54,11 @@ _show_children(m::Maxout) = m.layers | |||
_show_children(p::Parallel) = (p.connection, p.layers...) | |||
_show_children(f::PairwiseFusion) = (f.connection, f.layers...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are a further complication to any attempt to make a show
easier to opt into. If you claim (through a macro or a type) that your layer is a Chain-like container, do you imply that you want any Tuple within it to be splatted, i.e. that you have a constructor like Parallel(f, layers...) = Parallel(f, layers)
? Or is that still an obscure function you ought to overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW the @layer
PR automates this:
https://github.com/FluxML/Flux.jl/pull/1932/files#diff-08261c6ab5a9b75d3f9bca6a6f93b32fefe4e22e91ffcfdfd8d88dad08c3a498R61-R68
This proposes to gives Flux's layer types various supertypes.
One reason to like this is that it simplifies the use of
show
. If you have the same supertype asChain
, you will be unfolded at top level like it is. No mystery functions to overload. Closes #1932, closes #2044Edit: 568af9b goes further: We can define
functor
for this abstract type, eliminating the need to call@functor
. (It's a pretty mysterious macro, and you can't@macroexpand
it to have a look.) We can also definetrainable
for some abstract types; maybe that's also less mysterious.Another is this:
Flux.gpu
andCUDA.cu
both recursively move things, the latter via Adapt not Functors. Which meanscu
does not preserve tied weights. But if we can rely on the the shared arrays both living within a struct whose type we own (like Chain) then we can convertcu
to something Functors-based at that boundary. (It's enough for one outer layer to do this -- using weird Dense-like layers marked only with@functor
within a Chain is fine.)Note that this supertype is entirely optional. The PR does not change the fact that
functor
is how Flux walks models, etc, and so it does not restrict how you can customise things. It only aims to make it easy to opt-in to the standard behaviours, without a zoo of weird macros.