-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confused of linear segmentation #47
Comments
Yes, that is my understanding as well. |
I think it is kind of strange to do upsampling directly. |
Why? If you don't do it right away, what would you do instead: upsample the segmentation map? You would get a blockier result using nearest-neighbor interpolation compared to the bilinear interpolation of the logit map followed by the argmax. |
The downside being that upsampling could create logit maps that the classification head hasn't been trained to correctly classify with argmax, at least that is my understanding |
I don't mean that we use nearest-neighbor interpolation or don't do upsample. Thanks for your replying, I found that it is the same in SegFormer |
I know it is required to obtain the higher resolution to match the mask label, but computing logits on a lower resolution maps seems not that precise. |
Hi As you noted, linear segmentation at patch resolution followed by upsampling is quite coarse, and will not produce the best scores (we get up to 53 mIoU on ADE20k with that, which is good, but definitely not sota). We only used it as a probe of the quality of the features. If you intend to use the model for segmentation, I'd advise using a slightly bigger head that incorporates upsampling, such as UperNet or DPT. You can also go for the full ViT-Adapter + M2F pipeline, with which we get 60.2 mIoU on ADE20k |
Thanks for your replying and explanation! |
Closing as answered and using #55 to keep track of feedback on documentation, training code or model weights for segmentation. |
Your work is impressive and thanks for your code release.
I got a question about linear semantic segmentation. In your paper, an upsampling operation is after the linear layer, is it just a interpolate operation like
F.interpolate()
, also mentioned in #25 ? If it is, from my understanding, it is an interpolation of the class probability computed by linear layer, is it right?The text was updated successfully, but these errors were encountered: