Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with VSRnet and VESPCN output #7

Open
tthg119 opened this issue Dec 15, 2020 · 6 comments
Open

Problem with VSRnet and VESPCN output #7

tthg119 opened this issue Dec 15, 2020 · 6 comments

Comments

@tthg119
Copy link

tthg119 commented Dec 15, 2020

Hi, may I know what is the proper command of ffmpeg when using VSRnet and VESPCN models, or specific steps needed to be done when compiling the model? The outputs I have is always like in the picture below, tried with several settings and encoders etc.
Screenshot from 2020-12-15 15-15-49

@edhoyle
Copy link

edhoyle commented Apr 10, 2021

Hi @tthg119, please can you share the command line that you used to ffmpeg with VESPCN.
I'm using ESPCN and SRCNN filters without problems, but when I tried VESPCN I've this error:
"tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops_fused_impl.h:716 : Invalid argument: input depth must be evenly divisible by filter depth: 1 vs 3"

@jlfilho
Copy link

jlfilho commented Aug 14, 2021

I also have the same problem reported by @tthg119 with the VESPCN model. Does anyone know if the sr filter supports models with multiple input frames?
@Witiko do you know about this?
image

@Witiko
Copy link

Witiko commented Aug 14, 2021

@jlfilho Could this be because ESPCN only upscales the luma (Y) and not the color components (UV)

lr_batch = tf.cast(data_batch['lr1'], tf.float32) / 255.0
whereas VSRnet and VESPCN upscale all three?
lr_batch = tf.concat([tf.cast(data_batch['lr0'], tf.float32) / 255.0,
lr_batch = tf.concat([tf.cast(data_batch['lr0'], tf.float32) / 255.0,

@Witiko
Copy link

Witiko commented Aug 14, 2021

After a bit of experimentation with MIR-MU/ffmpeg-tensorflow, it seems that VSRNET only correctly upscales the UV components, so I combined it with ESPCN, which upscales the Y component. How dreadfully inefficient!

ffmpeg-tensorflow -i flower_cif.y4m -filter_complex '
  [0:v] format=pix_fmts=yuv420p, split [clone1][clone2];
  [clone1] extractplanes=y+u+v [y][_1][_2]; [y] sr=dnn_backend=tensorflow:scale_factor=2:model=/models/espcn.pb [y_scaled];
  [clone2] sr=dnn_backend=tensorflow:scale_factor=2:model=/models/vsrnet.pb, extractplanes=y+u+v [_3][u_scaled][v_scaled];
  [y_scaled][u_scaled][v_scaled] mergeplanes=0x001020:yuv420p [merged];
  [_1] nullsink; [_2] nullsink; [_3] nullsink
' -map '[merged]' -sws_flags lanczos -c:v libx264 -crf 17 -c:a copy -y flower_cif_2x.mp4

mpv-shot0001

The Y component seems to be incorrectly interpreted by VSRnet and VESPCN, which is why they both produce an image with three copies of the original Y channel and an incorrect resolution, as you have shown in #7 (comment). This can likely be fixed by replacing yuv420p with a different pixel format (see ffmpeg -pix_fmts), but I am not sure which. Perhaps running ffprobe on some of the files used for training the models might shed some light on the matter.

@jlfilho
Copy link

jlfilho commented Aug 15, 2021

@jlfilho Could this be because ESPCN only upscales the luma (Y) and not the color components (UV)

lr_batch = tf.cast(data_batch['lr1'], tf.float32) / 255.0

whereas VSRnet and VESPCN upscale all three?

lr_batch = tf.concat([tf.cast(data_batch['lr0'], tf.float32) / 255.0,

lr_batch = tf.concat([tf.cast(data_batch['lr0'], tf.float32) / 255.0,

@Witiko thank you for your response.
In fact, the VESPCN and VSRNET models, both work with the Y color channel too. The difference is that both are Spatio-temporal models, that is, they explore neighboring frames of the target frame, hence, they have input frames [lr0, lr1,lr2], while the ESPCN and SRCNN models are spatial models, it has only input frame [lr1].
image

I believe that the problem is in the SR FFmpeg module, maybe it is not compatible with temporal models or would you need some FFmpeg filter to get the three frames for input as the temporal models expect. But, I don’t know exactly what the problem is!

Could @HighVoltageRocknRoll, @mengxuewei help us with this problem?

@Witiko
Copy link

Witiko commented Aug 15, 2021

I have reposted this interesting question to the ffmpeg-user mailing list: http://ffmpeg.org/pipermail/ffmpeg-user/2021-August/053322.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants