Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate why u-net performs poorly with style transfer #5

Open
lgvaz opened this issue Mar 12, 2020 · 4 comments
Open

Investigate why u-net performs poorly with style transfer #5

lgvaz opened this issue Mar 12, 2020 · 4 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@lgvaz
Copy link
Owner

lgvaz commented Mar 12, 2020

Theoretically it should be way better than TransformerNet.

It performs really well for superres (which is almost the same thing). It's a more appropriate architecture image to image problems overall.

@lgvaz lgvaz added help wanted Extra attention is needed question Further information is requested labels Mar 12, 2020
@lgvaz lgvaz changed the title Investigate why u-net performs poorly Investigate why u-net performs poorly with style transfer Mar 13, 2020
@lgvaz
Copy link
Owner Author

lgvaz commented Apr 2, 2020

This paper does a great explanation of why U-net might fail in some cases. Quoting from the paper:

The U-net is ”lazy”. That is to say if the U-net find itself
able to handle a problem in low-level layers, the high-level
layers will not bother to learn anything. If we train a U-net
to do a very simple work ”copying image” as in fig. 4, where
the inputs and outputs are same, the loss value will drop to
0 immediately. Because the first layer of encoder discovers
that it can simply transmit all features directly to the last
layer of the decoder by skiping connection to minimize the
loss. In this case, no matter how many times we train the
U-net, the mid-level layers will not get any gradient.

@lgvaz
Copy link
Owner Author

lgvaz commented Apr 2, 2020

A strategy for solving the issue can be:

  • Freeze skip connections and train network
  • After some time unfreeze skip connections and see what happens?

@lgvaz
Copy link
Owner Author

lgvaz commented Apr 2, 2020

The paper talks about "Guide decoders", although it's not deeply explained what they mean.

I think what I can try doing, is generating the image without the skip connections at each middle layer (basically repeating the next layers but without skip connections). This would generate a image for each middle layer, thus the gradient is always present.

@lgvaz
Copy link
Owner Author

lgvaz commented Apr 2, 2020

First try to modify DynamicUnet failed miserably. Need to find a away to get the output of each UnetBlock with and without skip connections, and then use that to create multiple outputs from DynamicUnet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant