Confused of linear segmentation #47

fangyixiao18 · 2023-04-23T13:08:00Z

Your work is impressive and thanks for your code release.

I got a question about linear semantic segmentation. In your paper, an upsampling operation is after the linear layer, is it just a interpolate operation like F.interpolate(), also mentioned in #25 ? If it is, from my understanding, it is an interpolation of the class probability computed by linear layer, is it right?

The text was updated successfully, but these errors were encountered:

woctezuma · 2023-04-23T13:26:04Z

from my understanding, it is an interpolation of the class probability computed by linear layer, is it right?

Yes, that is my understanding as well.
The segmentation map is obtained by usampling the low-resolution logit map then taking the argmax.

fangyixiao18 · 2023-04-23T13:28:47Z

from my understanding, it is an interpolation of the class probability computed by linear layer, is it right?

Yes, that is my understanding as well. The segmentation map is obtained by usampling the low-resolution logit map then taking the argmax.

I think it is kind of strange to do upsampling directly.

woctezuma · 2023-04-23T13:50:22Z

Why? If you don't do it right away, what would you do instead: upsample the segmentation map? You would get a blockier result using nearest-neighbor interpolation compared to the bilinear interpolation of the logit map followed by the argmax.

ccharest93 · 2023-04-23T14:09:39Z

The downside being that upsampling could create logit maps that the classification head hasn't been trained to correctly classify with argmax,
in that sense upsampling into argmax gives higher resolution segmentation map
argmax into upsampling generalizes better to (out of distribution) for the upsampled points

at least that is my understanding

fangyixiao18 · 2023-04-23T14:31:57Z

Why? If you don't do it right away, what would you do instead: upsample the segmentation map? You would get a blockier result using nearest-neighbor interpolation compared to the bilinear interpolation of the logit map followed by the argmax.

I don't mean that we use nearest-neighbor interpolation or don't do upsample. Thanks for your replying, I found that it is the same in SegFormer

fangyixiao18 · 2023-04-23T14:39:00Z

The downside being that upsampling could create logit maps that the classification head hasn't been trained to correctly classify with argmax, in that sense upsampling into argmax gives higher resolution segmentation map argmax into upsampling generalizes better to (out of distribution) for the upsampled points

at least that is my understanding

I know it is required to obtain the higher resolution to match the mask label, but computing logits on a lower resolution maps seems not that precise.
I am not familiar with segmentation, so I am not sure whether it is the most appropriate way to compute logits on a lower resolution maps, to keep the balance of performance and computing cost?

TimDarcet · 2023-04-24T09:08:13Z

Hi

As you noted, linear segmentation at patch resolution followed by upsampling is quite coarse, and will not produce the best scores (we get up to 53 mIoU on ADE20k with that, which is good, but definitely not sota). We only used it as a probe of the quality of the features.

If you intend to use the model for segmentation, I'd advise using a slightly bigger head that incorporates upsampling, such as UperNet or DPT. You can also go for the full ViT-Adapter + M2F pipeline, with which we get 60.2 mIoU on ADE20k

fangyixiao18 · 2023-04-24T11:08:22Z

Hi

As you noted, linear segmentation at patch resolution followed by upsampling is quite coarse, and will not produce the best scores (we get up to 53 mIoU on ADE20k with that, which is good, but definitely not sota). We only used it as a probe of the quality of the features.

If you intend to use the model for segmentation, I'd advise using a slightly bigger head that incorporates upsampling, such as UperNet or DPT. You can also go for the full ViT-Adapter + M2F pipeline, with which we get 60.2 mIoU on ADE20k

Thanks for your replying and explanation!

patricklabatut · 2023-04-24T22:54:17Z

Closing as answered and using #55 to keep track of feedback on documentation, training code or model weights for segmentation.

patricklabatut assigned TimDarcet Apr 24, 2023

patricklabatut mentioned this issue Apr 24, 2023

[request] Semantic segmentation documentation, training code and / or model weights #55

Open

patricklabatut closed this as completed Apr 24, 2023

NielsRogge mentioned this issue Aug 3, 2023

DINOv2 is now available in HF Transformers (with tutorial) #153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused of linear segmentation #47

Confused of linear segmentation #47

fangyixiao18 commented Apr 23, 2023 •

edited

Loading

woctezuma commented Apr 23, 2023 •

edited

Loading

fangyixiao18 commented Apr 23, 2023

woctezuma commented Apr 23, 2023 •

edited

Loading

ccharest93 commented Apr 23, 2023 •

edited

Loading

fangyixiao18 commented Apr 23, 2023

fangyixiao18 commented Apr 23, 2023

TimDarcet commented Apr 24, 2023

fangyixiao18 commented Apr 24, 2023

patricklabatut commented Apr 24, 2023

Confused of linear segmentation #47

Confused of linear segmentation #47

Comments

fangyixiao18 commented Apr 23, 2023 • edited Loading

woctezuma commented Apr 23, 2023 • edited Loading

fangyixiao18 commented Apr 23, 2023

woctezuma commented Apr 23, 2023 • edited Loading

ccharest93 commented Apr 23, 2023 • edited Loading

fangyixiao18 commented Apr 23, 2023

fangyixiao18 commented Apr 23, 2023

TimDarcet commented Apr 24, 2023

fangyixiao18 commented Apr 24, 2023

patricklabatut commented Apr 24, 2023

fangyixiao18 commented Apr 23, 2023 •

edited

Loading

woctezuma commented Apr 23, 2023 •

edited

Loading

woctezuma commented Apr 23, 2023 •

edited

Loading

ccharest93 commented Apr 23, 2023 •

edited

Loading