You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a few questions/clarifications regarding the hdf5 dataset that was linked on the notebook:
I ran the notebook for training from scratch using the existing hdf5 and obtained a CER of ~0.09 using just a single model (and not an ensemble).
When creating the hdf5 from scratch and running the training procedure my CER is similar to the best/second best models (~0.16-0.18).
So, as far as I can see the main difference would be in the dataset generation/preprocessing steps or the tokenizer:
a. In the notebook there's a comment that the pretained models used a vocab size of 100 as opposed to 99 (95 characters + SOS/EOS/PAD/UNK tokens)- is there an additional token used here?
b. Was the generation procedure of the hdf5 that was linked/on the google drive a little different?
Thank you!
The text was updated successfully, but these errors were encountered:
I also don't remember exactly the first iteration but I am working on a paper with the different experiments involving pre-processing steps that will help the community. I will update you once it is finalized.
I had a few questions/clarifications regarding the hdf5 dataset that was linked on the notebook:
So, as far as I can see the main difference would be in the dataset generation/preprocessing steps or the tokenizer:
a. In the notebook there's a comment that the pretained models used a vocab size of 100 as opposed to 99 (95 characters + SOS/EOS/PAD/UNK tokens)- is there an additional token used here?
b. Was the generation procedure of the hdf5 that was linked/on the google drive a little different?
Thank you!
The text was updated successfully, but these errors were encountered: