-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with ProtoStruct.jl, and LayerFactory ideas for custom layers #2107
Comments
This is because:
and Functors isn't careful: it checks fieldnames but calls getproperty: https://github.com/FluxML/Functors.jl/blob/v0.3.0/src/functor.jl#L11-L16 One fix is: Another fix would be for ProtoStructs to make |
Nice! Thanks for figuring this out. Aside: I wonder if something like Here's a terrible idea I had, just to see what if it spurs any useful ideas for others: @layer struct ResBlock
w1::Dense
act::Function
forward = (self, x, y) -> begin
self.act(self.w1(x)))
end
end where So, from this, the macro would generate the following code: struct ResBlock{NT<:NamedTuple}
properties::NT
end
function getproperty
...
end
# etc., etc.
function (self::ResBlock)(x, y)
self.forward(self, x, y)
end |
(Maybe I am thinking of trying this out: https://github.com/Suzhou-Tongyuan/ObjectOriented.jl - it's definitely useful for DL-type models) |
I don't know if we want that as default Supporting ProtoStruct layers is good, because sometimes writing your own layer is unavoidable. But Flux has some features that make using different types (classes) for each sub-layer not the norm. For example, Metalhead.jl, the vision model library, can build almost all types of residual networks without defining a
(Btw this is not to shoot anything down; if what we have doesn't really do what you want, then we want to know. Love the enthusiasm!) |
To build on Kyle's point: Flux's philosophy towards layer types is very much "come as you are". We'd like to make it easier to define/register existing structs as layers and to do so without depending on Flux itself. So while we should absolutely try to support libraries like ProtoStruct and ObjectOriented.jl if we can, we also want to keep the barrier of entry for working with the module system as low as possible (if you know how to define a callable struct, you can make a layer). |
Thanks for all the comments on my admittedly half-thought-out ideas! Did not know about
Good point, agreed! I wonder if you would want to mention Revise.jl+ProtoStructs.jl on the custom layer page (or even For a lot of users, my sense is that Flux.jl may be the very first Julia package they try out.
The compatibility makes a lot of sense. I am just trying to think if there's any way to simplify the current custom layer declaration method for new users. Right now you need to call Not only is it four separate calls, it's four different types of calls. I need to create (1) a stuct, then (2) a method, (3) call a macro, then separately (4) construct the layer's components. And I need to do that for every single custom layer. In Haiku and PyTorch, you would only create a (1) class and (2) method – sometimes you can even create a custom layer with just a method. Using four different ideas to make a simple NN layer just seems a bit heavy, and the self methods in Julia (like Even something like struct MyLayer
x::Dense
y::Dense
end
@layermethod (m::MyLayer)(z) -> relu(m.x(z) + m.y(z)) might make for a slightly cleaner convenience API. Though the standard methods would of course also be available. Thoughts? I am aware I could simply be too comfortable with the Python DL ecosystem, though, and this isn't Julia-esque enough. No worries if that is the case. I think my dream layer creation method would really be something like @layerfactory function my_layer(n_in, n_out)
w1 = Dense(n_in, 128)
w2 = Dense(128, n_out)
return (x) -> begin
y = relu(w1(x)) + x
w2(y)
end
end which would let you construct the layer struct, the layer's components, and the forward pass, all in one go. (But that would obviously be tricky to implement). Best, |
What's nice about 1,2,4 is that there is nothing Flux-specific about them. They are completely ordinary Julia code. Making a special DSL isn't impossible, but it's one more thing you have to learn, and it will have limitations. This is a little bit like the Most things for which it's worth defining a new layer will want something extra. If I'm reading correctly the example here is an easy combination of existing layers: Chain(SkipConnection(Dense(n_in => 128, relu), +), Dense(128 => nout)) Some of us would like to remove I agree we should mention Revise.jl & maybe ProtoStructs.jl. On the ecosystem page for sure. Maybe ProtoStructs.jl ought to be on the advanced layer building page too? (Which could use some cleaning up.) |
Thanks, I see your point. I just wonder if there's a way to still use ordinary Julia code but require less manual labor to build custom layers, at the expense of being slightly less generic. Recursing by default would definitely be an improvement!
This was just a MWE; in practice there would probably not be an existing layer for my applications. In some ways the layers like |
Here's one idea for a layer factory: struct LayerFactory{F<:Function,NT<:NamedTuple}
forward::F
layers::NT
end
LayerFactory(f; layers...) = LayerFactory(f, NamedTuple(layers))
function (f::LayerFactory)(args...)
return f.forward(f.layers, args...)
end
@functor LayerFactory This makes it super easy to construct custom layers. Watch this: my_layer = LayerFactory(; w1=Dense(5, 128), act=relu) do self, x
self.act(self.w1(x))
end That's literally all you need! I can construct custom layers in one line, without even changing the underlying structure of Flux.jl. And it works for training and everything. What do you think about this? |
This should work fine. What's a nice example of a nontrivial use? |
The use-case I had in mind is graph networks, where you have a set of (nodes, edges, globals) that you need to do scatter operations on - it seems tricky to get that working with a I am really happy about this e.g., here's another example: model = LayerFactory(;
w1=Dense(1, 128), w2=Dense(128, 128), w3=Dense(128, 1), act=relu
) do self, x
x = self.act(self.w1(x))
x = self.act(self.w2(x))
self.w3(x)
end
p = params(model) # works! |
Here's a simple implementation of a graph network in PyTorch: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.meta.MetaLayer |
It even works for compositions of LayerFactory! function MLP(n_in, n_out, nlayers)
LayerFactory(;
w1=Dense(n_in, 128), w2=[Dense(128, 128) for i=1:nlayers], w3=Dense(128, n_out), act=relu
) do self, x
embed = self.act(self.w1(x))
for w in self.w2
embed = self.act(w(embed))
end
self.w3(embed)
end
end
model = LayerFactory(; mlp1=MLP(1, 128, 2), mlp2=MLP(128, 1, 3)) do self, x
self.mlp2(self.mlp1(x))
end |
I am not super familiar with GNNs, but you might want to check out GraphNeuralNetworks.jl to see how they handle working with Flux. They do seem to have a custom Okay, I think I understand what you are saying. If you have a sufficiently complex forward function that involves sub-layers, then writing it from scratch with "base" Julia + Flux is a bunch of text. As would be the case with "base" PyTorch or Jax, but those libraries have utilities built on top like your While I have no problem with mlp(n_in, n_out, nlayers) = let w1 = Dense(n_in, 128), w2 = [Dense(128, 128) for i in 1:nlayers], w3 = Dense(128, n_out)
return function(x)
act = relu
embed = act(w1(x))
for w in w2
embed = act(w(embed))
end
w3(embed)
end
end
model = let mlp1 = mlp(1, 128, 2), mlp2 = mlp(128, 1, 3)
x -> mlp2(mlp1(x))
end
p = params(model) # works too! Below is just for your reference. Looking at the link you shared, this is what I would write in Flux: # this is one way avoiding structs completely
EdgeModel(edge_mlp = Chain(...)) = Chain(
(src, dest, edge_attr, u, batch) -> vcat(src, dest, edge_attr, u[batch]),
edge_mlp
)
# admittedly, structs seems nice here
Base.@kwdef struct NodeModel{T, S}
node_mlp_1::T = Chain(...)
node_mlp_2::S = Chain(...)
end
@functor NodeModel
function (m::NodeModel)((x, edge_index, edge_attr, u, batch))
row, col = edge_index
out = vcat(x[row], edge_attr)
out = m.node_mlp_1(out)
# not sure what this is doing but we have a NNlib.scatter
out = scatter_mean(out, col, ...)
out = vcat(x, out, u[batch])
return m.node_mlp_2(out)
end And so on. Modulo the |
Or another way of putting it: |
I don't understand how your example works. In your example, |
Oops brain fart on my part, but see the correction using a |
Okay but note that the anonymous function returned by the |
LayerFactory(f; layers...) = Base.Fix1(f, NamedTuple(layers)) We don't currently |
I see, thanks. The
Don't forget the model instantiation! Compare these two: model = LayerFactory(; w1=Dense(5, 128), w2=Dense(128, 1), act=relu) do self, x
x = self.act(self.w1(x))
self.w2(x)
end (or the struct MyLayer
w1::Dense
w2::Dense
act::Function
end
@functor MyLayer
function (self::MyLayer)(x)
x = self.act(self.w1(x))
self.w2(x)
end
model = MyLayer(Dense(5, 128), Dense(128, 1), relu) The latter example would discourage me from using it. Note also that the second example will break if I use Revise.jl and change the inputs, whereas
Up to you but I don't see a problem with including this in the code alongside |
Btw, in the |
The LayerFactory thing seems cute. Maybe see how it goes for building some models in real life and figure out what the unexpected warts are? One refinement which could be added is a macro which would put the
Yes. With Kyle's code:
|
Does this mean Python frameworks don't even meet the bar then? :P My impression is that Flux is already offering more layer helpers like |
Will do! 👍
Not quite... Say what you will about Python, but the DL frameworks are very polished. Here's how you would do a zero layer MLP in Haiku: @hk.transform
def forward(x):
w1 = hk.Linear(100)
w2 = hk.Linear(10)
return mlp2(jax.nn.relu(mlp1(x)))
params = forward.init(rng, x) PyTorch: class Net(nn.Module):
def __init__(self):
super().__init__()
self.w1 = nn.Linear(10, 100)
self.w2 = nn.Linear(100, 1)
def forward(self, x):
return self.w2(F.relu(self.w1(x)))
model = Net()
PyTorch actively discourages users from using |
I agree, the factory is much shorter even keeping aside the instantiation. But the factory isn't the default way to make layers in other frameworks either. From your most recent example (in Julia): Base.@kwdef struct Net{T, S}
w1::T = nn.Linear(10, 100)
w2::S = nn.Linear(100, 1)
end
(self::Net)(x) = self.w2(relu(self.w1(x)))
@functor Net # boo we also don't like this
model = Net() What I was trying to figure out is if you wanted the default mechanism to change or a utility built on top. But I think we settled this Q! We're talking about a convenience method here.
The problem here is that if I make a new This being said, I like the declarative nature and syntactic clarity of what you are proposing. I think the broader point here is that:
So, like Michael, I would be happy to include it...after some thought and maybe seeing if it has unforeseen limitations. |
Kyle beat me to it, but just to add this:
This is a good idea in some circumstances and a bad one in others. For example, torchvision models are full of |
Sounds good!
I'm not sure I completely understand. Is your goal to make all types of neural network models possible with But maybe user could always refactor their model into separate
You seem to be bringing up Julia v Python... I want to be clear I am really not trying to go there (I'm on the Julia side, for the record; I've just had experience with both!). I'm purely talking about the syntax itself. If you consider PyTorch by itself as a software and ecosystem, there is an obscene amount of code re-use. I can take someone's custom torch model with an extremely complex
Making it easier to construct custom layers seems precisely aligned with your goals, no? Then users can go build these layers themselves, rather than you having to worry about building a massive library of modules. And you only need to maintain the most commonly-re-used layers.
Right - you might use |
I think we're all on the same page here, just that the devil is in the details 🙂. Looking at the original PR which ended up spawning This ties into the code reuse discussion. What I think Kyle is trying to get at is that while a framework shouldn't try to create a DSL for every possible use case, it should try to provide affordances so that users aren't unncessarily having to roll their own code for trivial features. I can't count how many times I've seen research PyTorch code which defines a number of layer types just so that they can have a skip connection. You can tell because those layers are often awkwardly named—they kind of have to be because they really only represent some intermediate chunk of a larger model which wouldn't otherwise be considered standalone (second hardest problem in computer science, etc).
Again, I think we are of roughly the same mind about this. There's a reason, Metalhead.jl, torchvision, Timm, etc. use this pattern. It's also one reason we're hesitant to loudly advertise layer building functionality which always returns a (semi-)anonymous type: you lose that important semantic information from the layer name that you get by using a named class in Python or struct in Julia. |
Let me start by saying I don't fundamentally disagree with the feature proposal. I'm just trying to shine light on the design decisions we made in FluxML. Hopefully, this is useful and not unwanted.
We have a slight miscommunication, which is my fault for using "Python" as a catch-all when I really meant "X where X is one of TF/Jax/PyTorch" (i.e. considering each framework independently). I certainly wasn't referring to NumPy/Torch/SciPy/etc...I also don't want to go there, and it seems irrelevant to our discussion. In fact, for what we're discussing (syntax for building complex models), the host language (Julia or Python) seems irrelevant. The point of bringing up Python-based frameworks at all is because I agree with you---they are great DL frameworks. There's a lot of learn from, and so we can make useful comparisons to understand what we do wrong/right.
This isn't exactly the type of re-use I am referring to, and I don't think the various options we are discussing would limit this kind of re-use. Let's take a concrete example from Now, PyTorch folks could absolutely have written
Definitely not all types of models, for two reasons: (a) it's not possible, and (b) even if it were, it would make writing some models unnecessarily cumbersome. But I will say you can get really far without writing a massive DSL. Layers fall into two categories:
(1) is unavoidable in every framework unless you take an explicitly functional view and make users pass in the weights, state, etc. (2) is where the possible DSL size explosion could happen. But if you take a feedforward NN, then there is a limited set of structures you can see in the DAG---namely
I don't know...GoogLeNet's diagram is a pretty complex network but I think you can understand the flow of arguments just by looking at the figure. Even something like CLIP. Of course, DL isn't restricted to FF DAGs, nor should it be. And I get the feeling these are the kinds of models you work with. So then you need to define a custom (2). We absolutely want users to go ahead and do this whenever they feel like they should. Or maybe even for a simple CNN, you subjectively prefer to write out the forward pass. Go for it! If you do get to the point of writing a custom (2), then your layer factory makes the syntax really short. This is why I like it, and I am in favor of adding it. Sometimes it is better to "just write the forward pass," and sometimes it is better to use existing layers + builders to create a complex model. Both are "first class" in Flux. I don't want to leave you with the impression that we want everyone to build everything using only [1]: ResNet https://arxiv.org/pdf/1512.03385v1.pdf |
Oops as I was writing and editing my saga, Brian beat me to it by 40 minutes, but my browser didn't refresh :(. |
Here is macro version, which should let you write """
@Magic(forward::Function; construct...)
Creates a layer by specifying some code to construct the layer, run immediately,
and (usually as a `do` block) a function for the forward pass.
You may think of `construct` as keywords, or better as a `let` block creating local variables.
Their names may be used within the body of the `forward` function.
```
r = @Magic(w = rand(3)) do x
w .* x
end
r([1,1,1])
r([10,10,10]) # same random numbers
d = @Magic(in=5, out=7, W=randn(out,in), b=zeros(out), act=relu) do x
y = W * x
act.(y .+ b)
end
d(ones(5, 10)) # 7×10 Matrix
```
"""
macro Magic(fex, kwexs...)
# check input
Meta.isexpr(fex, :(->)) || error("expects a do block")
isempty(kwexs) && error("expects keyword arguments")
all(ex -> Meta.isexpr(ex, :kw), kwexs) || error("expects only keyword argumens")
# make strings
layer = "@Magic"
setup = join(map(ex -> string(ex.args[1], " = ", ex.args[2]), kwexs), ", ")
input = join(fex.args[1].args, ", ")
block = string(Base.remove_linenums!(fex).args[2])
# edit expressions
vars = map(ex -> ex.args[1], kwexs)
assigns = map(ex -> Expr(:(=), ex.args...), kwexs)
@gensym self
pushfirst!(fex.args[1].args, self)
addprefix!(fex, self, vars)
# assemble
quote
let
$(assigns...)
$MagicLayer($fex, ($layer, $setup, $input, $block); $(vars...))
end
end |> esc
end
function addprefix!(ex::Expr, self, vars)
for i in 1:length(ex.args)
if ex.args[i] in vars
ex.args[i] = :($self.$(ex.args[i]))
else
addprefix!(ex.args[i], self, vars)
end
end
end
addprefix!(not_ex, self, vars) = nothing
struct MagicLayer{F,NT<:NamedTuple}
fun::F
strings::NTuple{4,String}
variables::NT
end
MagicLayer(f::Function, str::Tuple; kw...) = MagicLayer(f, str, NamedTuple(kw))
(m::MagicLayer)(x...) = m.fun(m.variables, x...)
MagicLayer(args...) = error("MagicLayer is meant to be constructed by the macro")
Flux.@functor MagicLayer
function Base.show(io::IO, m::MagicLayer)
layer, setup, input, block = m.strings
print(io, layer, "(", setup, ") do ", input)
print(io, block[6:end])
end |
Thanks for sharing these answers, I completely agree and I think we are all on the same page! 🙂
This is AWESOME, nice job!! I am 👍👍 (two thumbs up) for the support of this feature as a convenient custom layer constructor. |
No doubt that has all sorts of bugs! But fun to write. Once you make a macro, it need not be tied to the LayerFactory keyword notation like this, of course. And whether this is likely to create pretty code or monstrosities, I don't know yet. |
Could this be added to Flux.jl, with “Very experimental.” stated in bold in the docstring? I can create a PR and add a couple tests and maybe a paragraph to the docs. |
Let me know if I could add it and I can make a PR? Would love to have a feature like this. The |
We've created an Fluxperimental.jl package for this purpose. Once it is set up and made public, we can ping you for a PR there (which would be appreciated!). |
Cool, sounds good to me! |
Ok, https://github.com/FluxML/Fluxperimental.jl is live |
Closing in favor of FluxML/Fluxperimental.jl#2 for layer factory and FluxML/Functors.jl#46 for ProtoStruct.jl issue. |
- Contributed by @mcabbot in FluxML/Flux.jl#2107
For people finding this issue, the discussion above has now resulted in the PR here: FluxML/Fluxperimental.jl#4 |
- See discussion in FluxML/Flux.jl#2107 Co-authored-by: Michael Abbott <[email protected]>
There's this really nice package ProtoStruct.jl that lets you create structs which can be revised. I think this is extremely useful for developing custom models in Flux.jl using Revise.jl, since otherwise I would need to restart every time I want to add a new property in my model.
Essentially the way it works is to transform:
into (regardless of the properties)
and, inside the macro, set up constructors based on your current defined properties.
However, right now it doesn't work with Flux.jl. When I try to get the parameters from a model, I see the error:
NamedTuple has no field properties
. Here is a MWE:and here is the error:
How hard would it be to make this compatible? I think it would be extremely useful to be able to quickly revise model definitions!
(Sorry for the spam today, by the way)
The text was updated successfully, but these errors were encountered: