How to train the Model on new dataset changing the vocabulary size #52

HidayatRahman · 2017-08-15T11:55:04Z

Hi,

I am training the Model on my own dataset which contains both uppercase and lowercase letter although it doesn't contains any wildcards so the new vocabulary is 26+26+10+3=65. The problem is the code only outputs Generating first batch instead of logging the loss and perplexity.

Any help would be appreciated.
thanks.

shoaibahmed · 2017-08-21T23:38:20Z

Hi,

If the system is only printing "Generating first batch", it means that the batch generation failed which resulted in skipping of the epoch. So "Generating first batch" would have been printed the number of epoch times.
Coming onto the issue, the reason might be different image size that you are providing to the system. Please check the argument 'img_width_range' and adjust it accordingly to your dataset in the script data_gen.py. Since you are training on a different set of vocabulary, also adjust the read_data function within data_gen.py. As soon as you will adjust the image sizes, the system will be able to load the batch and you will see the progress bar as expected.

Hope this helps.

HidayatRahman · 2017-08-22T06:06:41Z

Thanks for the reply,

actually the previous error has been resolved. Now the Error is in Dealing with Upper Case Letters. If you look at the following line of Code
word = [self.GO]
for c in lex:
assert 96 < ord(c) < 123 or 47 < ord(c) < 58
word.append(
ord(c) - 97 + 13 if ord(c) > 96 else ord(c) - 48 + 3)
from line 120-124 in data_gen.py its handling the Ascii Characters Especially lowercase and digits, i have changed the code to the Following and still i dint get satisfactory result after running the Code for three days. Its Still running

for c in lex:
if 96<ord(c)<123:
word.append(ord(c)-97+13)
if 47>ord(c)<58:
word.append(ord(c)-48 +3)
if 64>ord(c)<91:
word.append(ord(c)-25)

HidayatRahman · 2017-08-22T06:48:26Z

Actually i am trying to make the code working for both Upper and lowercase

HidayatRahman · 2017-08-22T07:24:22Z

The issue is in the above Code actually it should be written in this manner

for c in lex:
if 96< ord(c) <123:
word.append(ord(c)-97+13)
if 47< ord(c) <58:
word.append(ord(c)-48 +3)
if 64< ord(c) <91:
word.append(ord(c)-26)
Now the Ascii of the Characters and digits will occur in the range of 3 and 64

HidayatRahman · 2017-08-23T13:42:01Z

Still not Getting Satisfactory Result. How can i train the Model to predict both Uppercase and Lowercase characters.
Any Help would be appreciated.

pasha76 · 2017-12-13T15:53:05Z

@HidayatRahman how did you fix the image size issue? any progress regarding upper case training?

Thanks in advance

HidayatRahman · 2017-12-13T16:46:35Z

There was no issued in the Image size actually the issue was in my Conversion for uppercase letter. Anyhow i managed it and the code works for both upper and lower case, you have to add the code in data_gen.py where the ascii conversion actually occurs.
for c in lex:
if 96< ord(c) <123:
word.append(ord(c)-97+13)
if 47< ord(c) <58:
word.append(ord(c)-48 +3)
if 64< ord(c) <91:
word.append(ord(c)-26)

And also you have to also do the decoding in model.py function manually.

hope it helps,

Amschel · 2018-02-19T18:54:55Z

Hi @HidayatRahman ,

Could you please explain how to edit the model.py? I tried to do it using your hints but it doesn't work. Maybe if you share how exactly you modified the model.py, I should understand where I have the mistakes.

Thank you!
Best regards.

HidayatRahman · 2018-02-19T19:38:25Z

Sorry Man i just lost My Virtual Machine in which i was running the Code. but you have to do the reverse step in Model.py in order to decode it Properly. line number 427 and 428 does the same
fword.write(' '.join([chr(c-13+97) if c-13+97>96 else chr(c-3+48) for c in ground_valid])+'\n')
fword.write(' '.join([chr(c-13+97) if c-13+97>96 else chr(c-3+48) for c in output_valid]))

Here you have to just add another condition 64<ascii<91. I guess its pretty simple now. Let me know if its still not working.

best wishes,

Amschel · 2018-02-22T12:00:06Z

Hi @HidayatRahman ,

Yes, I tried to do just that but I don't quite understand those 1 liner. I tried to expand the code to a one that I understand but I get an error.

ghost · 2018-05-25T10:00:05Z

hi @HidayatRahman What is the vocabulary size used for training ?

kulkarnivishal · 2018-06-15T02:20:37Z

hi @shoaibahmed
Could you help me understand your comment? I am facing the same problem. The system is only printing "Generating first batch" and the bar is stuck at 0%. I am training on MJsynth dataset (Synthetic word dataset covering 9 million images and 90K words). What parameters do i have to change?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train the Model on new dataset changing the vocabulary size #52

How to train the Model on new dataset changing the vocabulary size #52

HidayatRahman commented Aug 15, 2017

shoaibahmed commented Aug 21, 2017

HidayatRahman commented Aug 22, 2017

HidayatRahman commented Aug 22, 2017

HidayatRahman commented Aug 22, 2017

HidayatRahman commented Aug 23, 2017

pasha76 commented Dec 13, 2017

HidayatRahman commented Dec 13, 2017

Amschel commented Feb 19, 2018

HidayatRahman commented Feb 19, 2018

Amschel commented Feb 22, 2018 •

edited

Loading

ghost commented May 25, 2018 •

edited by ghost

Loading

kulkarnivishal commented Jun 15, 2018

How to train the Model on new dataset changing the vocabulary size #52

How to train the Model on new dataset changing the vocabulary size #52

Comments

HidayatRahman commented Aug 15, 2017

shoaibahmed commented Aug 21, 2017

HidayatRahman commented Aug 22, 2017

HidayatRahman commented Aug 22, 2017

HidayatRahman commented Aug 22, 2017

HidayatRahman commented Aug 23, 2017

pasha76 commented Dec 13, 2017

HidayatRahman commented Dec 13, 2017

Amschel commented Feb 19, 2018

HidayatRahman commented Feb 19, 2018

Amschel commented Feb 22, 2018 • edited Loading

ghost commented May 25, 2018 • edited by ghost Loading

kulkarnivishal commented Jun 15, 2018

Amschel commented Feb 22, 2018 •

edited

Loading

ghost commented May 25, 2018 •

edited by ghost

Loading