Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe, the dense layers should be larger? #4

Open
tomthe opened this issue Apr 26, 2023 · 0 comments
Open

Maybe, the dense layers should be larger? #4

tomthe opened this issue Apr 26, 2023 · 0 comments

Comments

@tomthe
Copy link

tomthe commented Apr 26, 2023

Hi!

I did work on some "institution string" -> country/state/city/institution_id models before, so I looked a bit into this code and found that you use two dense layers with only 2048 and 1024 neurons in these layers. The final softmax layer has a length of 102392. In my experience this might lead to non-optimal results. The 1024 neurons act as a bottleneck. Of course they can encode even much more than 100000 classes - it is just that I guess the results might benefit from larger layers.

Some more comments:

  • separate models for country/city/institution_id perform better than one model which only predicts the institution_ids. Mainly because
    • some institutions are spread out over different cities and countries
    • some raw_affilliation_strings contain information about a city or country, but not about the specific institution
  • Since the raw_affilliation_strings do not contain complicated structure like natural language does, DistilBERT might be overkill. I had very good results with word- and character n-grams
  • to disambiguate very information-poor strings like "department of Bology", we must connect other sources of information to this data. E.g. the trajectories of the authors. We haven't done this yet.

Thank you for providing all this data and code openly! This is great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant