You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am currently working with the German speech recognition data provided by this project and came across the following line in the README:
"In addition, we also filter out samples that are considered 'noisy', that is, samples having very high WER (word error rate) or CER (character error rate) w.r.t. a previously trained German model."
Unfortunately, I do not have access to a pre-trained German model to calculate the WER or CER for my dataset. This makes it challenging for me to filter out the noisy samples effectively.
Could you please provide a list of these 'noisy' samples or the criteria used to identify them?
The text was updated successfully, but these errors were encountered:
Hello,
I am currently working with the German speech recognition data provided by this project and came across the following line in the README:
"In addition, we also filter out samples that are considered 'noisy', that is, samples having very high WER (word error rate) or CER (character error rate) w.r.t. a previously trained German model."
Unfortunately, I do not have access to a pre-trained German model to calculate the WER or CER for my dataset. This makes it challenging for me to filter out the noisy samples effectively.
Could you please provide a list of these 'noisy' samples or the criteria used to identify them?
The text was updated successfully, but these errors were encountered: