Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for triple frequency resolution mono spectrograms using RGB channels #50

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

auryd
Copy link

@auryd auryd commented Dec 28, 2022

This is another way to use the color channels which adds a lot of audio quality (alternative to GB stereo). It seems to be compatible with further fine-tuning of the existing model checkpoint on RGB mono spectrograms.

@dene-
Copy link

dene- commented Dec 28, 2022

Issues I've found out, just cloning your fork and running the commands without extra params besides -t to use triple freq mono:

  • Running audio-to-image only with -t param, results in the whole audio file converted to an image, instead of chunks of 512px (apparently it's well converted, with different spectrograms on each channel).
  • Grabbing that long image and getting a 512x512px chunk, when converting that back to audio using image-to-audio, default params, gives me this error:
    • image (relevant part of the error).

@auryd
Copy link
Author

auryd commented Dec 28, 2022

Issues I've found out, just cloning your fork and running the commands without extra params besides -t to use triple freq mono:

  • Running audio-to-image only with -t param, results in the whole audio file converted to an image, instead of chunks of 512px (apparently it's well converted, with different spectrograms on each channel).

  • Grabbing that long image and getting a 512x512px chunk, when converting that back to audio using image-to-audio, default params, gives me this error:

    • image (relevant part of the error).

Thanks for trying it out!

For the first part, that matches the existing behavior.

I don't repro the error, even with long audio files.

Does the error repro when you run that audio file through the default settings with audio-to-image as well? If not, and it's unique to triple_res_mono, can you share the file?

@dene-
Copy link

dene- commented Dec 29, 2022

Issues I've found out, just cloning your fork and running the commands without extra params besides -t to use triple freq mono:

  • Running audio-to-image only with -t param, results in the whole audio file converted to an image, instead of chunks of 512px (apparently it's well converted, with different spectrograms on each channel).

  • Grabbing that long image and getting a 512x512px chunk, when converting that back to audio using image-to-audio, default params, gives me this error:

    • image (relevant part of the error).

Thanks for trying it out!

For the first part, that matches the existing behavior.

I don't repro the error, even with long audio files.

Does the error repro when you run that audio file through the default settings with audio-to-image as well? If not, and it's unique to triple_res_mono, can you share the file?

I managed to get it working, but only if I convert back to audio the whole image file. If I cut it on chunks (to train for example), I can't convert them back and it throws that error.

Is there a possibility to add low freqs in one of the channels? I think adding them will add much more to the perceived quality improvement. In both normal and triple mono there's no lows at all 🤔

I tried converting using --num-frequencies param with values of 768 and 1024 and despite there's more quality, it still sounds like you cut the low freqs, but I suppose it's the limitation of encoding as image :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants