Default model sizes are much smaller than BERT base #81

bertmaher · 2020-08-28T04:39:06Z

The base BERT model in https://arxiv.org/pdf/1810.04805.pdf uses 768 hidden features, 12 layers, 12 heads (which are also the defaults in bert.py), while the default configuration in the argparser of __main__.py uses 256/8/8. Would it make sense to align the example script with the paper? I spent quite a while puzzling over my low GPU utilization with the default configuration. Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default model sizes are much smaller than BERT base #81

Default model sizes are much smaller than BERT base #81

bertmaher commented Aug 28, 2020

Default model sizes are much smaller than BERT base #81

Default model sizes are much smaller than BERT base #81

Comments

bertmaher commented Aug 28, 2020