You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the following questions to which someone more familiar with the music source separation scene might know an answer:
Considering there is a derverb model by anvuew for vocals, would it be possible to train in similar way a model to remove effects from guitar? I know about RemFx but want to try roformers for this task in addition to demucs.
If i train a said model, then in this case the dataset I would need (type 1) should be "noeffect", "effect" and "mixture" same as "effect"?
In case of roformers I can set target_instrument which should then prioritize target_instrument loss metric for training?
In case of demucs though as I understood so far it prioritizes both or their average, which is not ideal, how to make demucs work for this task?
Models posted mostly include sdr as a metric, but for metric_for_scheduler generally first to try out should be aura_mrstft or si_sdr? Optimizing for sdr seems deceptive as high sdr does not immediately mean good to listening test.
Is there value in training model for 16 kHz sampling rate? Most models are trained for 44.1 but it is of course computationally expensive. Would it make sense to train model on lower sampling rate to tune training parameters at first for example?
I am a junior researcher and have very little experience with audio ML so hope those questions are appropriate.
Also while this might me extensive, this project would benefit from a more general Wiki/FAQ, as it seems to incorporate if not all but a lot of of most relevant modern research (models, metrics, augmentation implementations) in the field.
Thank you in advance for the project.
The text was updated successfully, but these errors were encountered:
I think it's possible. You need to have "clean guitar" + "effects".
As training data you only need "noeffect/clean guitar" and "effect". You don't need mixture for training (only for validation).
Yes you can do priority, but in my opinion optimizing both instruments at the same time gives the same result.
I didn't test demucs with non-null target_instrument, so I can't say
Usually all metrics grows simultaneously. You can use any you feel better.
We don't actually use sample rate we operate with "chunk_size". If you use the same "chunk_size" for model it will have the same complexity. If you convert data 44.1 kHz -> 16 kHz you can lower chunk size and reduce model size. I didn't try this because everyone wants to use models on high quality data. For debugging it's possible.
Hi
I have the following questions to which someone more familiar with the music source separation scene might know an answer:
I am a junior researcher and have very little experience with audio ML so hope those questions are appropriate.
Also while this might me extensive, this project would benefit from a more general Wiki/FAQ, as it seems to incorporate if not all but a lot of of most relevant modern research (models, metrics, augmentation implementations) in the field.
Thank you in advance for the project.
The text was updated successfully, but these errors were encountered: