Maybe, the dense layers should be larger? #4

tomthe · 2023-04-26T12:04:33Z

Hi!

I did work on some "institution string" -> country/state/city/institution_id models before, so I looked a bit into this code and found that you use two dense layers with only 2048 and 1024 neurons in these layers. The final softmax layer has a length of 102392. In my experience this might lead to non-optimal results. The 1024 neurons act as a bottleneck. Of course they can encode even much more than 100000 classes - it is just that I guess the results might benefit from larger layers.

Some more comments:

separate models for country/city/institution_id perform better than one model which only predicts the institution_ids. Mainly because
- some institutions are spread out over different cities and countries
- some raw_affilliation_strings contain information about a city or country, but not about the specific institution
Since the raw_affilliation_strings do not contain complicated structure like natural language does, DistilBERT might be overkill. I had very good results with word- and character n-grams
to disambiguate very information-poor strings like "department of Bology", we must connect other sources of information to this data. E.g. the trajectories of the authors. We haven't done this yet.

Thank you for providing all this data and code openly! This is great!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe, the dense layers should be larger? #4

Maybe, the dense layers should be larger? #4

tomthe commented Apr 26, 2023

Maybe, the dense layers should be larger? #4

Maybe, the dense layers should be larger? #4

Comments

tomthe commented Apr 26, 2023