Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confused of linear segmentation #47

Closed
fangyixiao18 opened this issue Apr 23, 2023 · 9 comments
Closed

Confused of linear segmentation #47

fangyixiao18 opened this issue Apr 23, 2023 · 9 comments
Assignees

Comments

@fangyixiao18
Copy link

fangyixiao18 commented Apr 23, 2023

Your work is impressive and thanks for your code release.

I got a question about linear semantic segmentation. In your paper, an upsampling operation is after the linear layer, is it just a interpolate operation like F.interpolate(), also mentioned in #25 ? If it is, from my understanding, it is an interpolation of the class probability computed by linear layer, is it right?

@woctezuma
Copy link

woctezuma commented Apr 23, 2023

from my understanding, it is an interpolation of the class probability computed by linear layer, is it right?

Yes, that is my understanding as well.
The segmentation map is obtained by usampling the low-resolution logit map then taking the argmax.

@fangyixiao18
Copy link
Author

from my understanding, it is an interpolation of the class probability computed by linear layer, is it right?

Yes, that is my understanding as well. The segmentation map is obtained by usampling the low-resolution logit map then taking the argmax.

I think it is kind of strange to do upsampling directly.

@woctezuma
Copy link

woctezuma commented Apr 23, 2023

Why? If you don't do it right away, what would you do instead: upsample the segmentation map? You would get a blockier result using nearest-neighbor interpolation compared to the bilinear interpolation of the logit map followed by the argmax.

@ccharest93
Copy link

ccharest93 commented Apr 23, 2023

The downside being that upsampling could create logit maps that the classification head hasn't been trained to correctly classify with argmax,
in that sense upsampling into argmax gives higher resolution segmentation map
argmax into upsampling generalizes better to (out of distribution) for the upsampled points

at least that is my understanding

@fangyixiao18
Copy link
Author

Why? If you don't do it right away, what would you do instead: upsample the segmentation map? You would get a blockier result using nearest-neighbor interpolation compared to the bilinear interpolation of the logit map followed by the argmax.

I don't mean that we use nearest-neighbor interpolation or don't do upsample. Thanks for your replying, I found that it is the same in SegFormer

@fangyixiao18
Copy link
Author

The downside being that upsampling could create logit maps that the classification head hasn't been trained to correctly classify with argmax, in that sense upsampling into argmax gives higher resolution segmentation map argmax into upsampling generalizes better to (out of distribution) for the upsampled points

at least that is my understanding

I know it is required to obtain the higher resolution to match the mask label, but computing logits on a lower resolution maps seems not that precise.
I am not familiar with segmentation, so I am not sure whether it is the most appropriate way to compute logits on a lower resolution maps, to keep the balance of performance and computing cost?

@TimDarcet
Copy link

Hi

As you noted, linear segmentation at patch resolution followed by upsampling is quite coarse, and will not produce the best scores (we get up to 53 mIoU on ADE20k with that, which is good, but definitely not sota). We only used it as a probe of the quality of the features.

If you intend to use the model for segmentation, I'd advise using a slightly bigger head that incorporates upsampling, such as UperNet or DPT. You can also go for the full ViT-Adapter + M2F pipeline, with which we get 60.2 mIoU on ADE20k

@fangyixiao18
Copy link
Author

Hi

As you noted, linear segmentation at patch resolution followed by upsampling is quite coarse, and will not produce the best scores (we get up to 53 mIoU on ADE20k with that, which is good, but definitely not sota). We only used it as a probe of the quality of the features.

If you intend to use the model for segmentation, I'd advise using a slightly bigger head that incorporates upsampling, such as UperNet or DPT. You can also go for the full ViT-Adapter + M2F pipeline, with which we get 60.2 mIoU on ADE20k

Thanks for your replying and explanation!

@patricklabatut
Copy link
Contributor

Closing as answered and using #55 to keep track of feedback on documentation, training code or model weights for segmentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants