Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train a basic LDA model using the NIPS corpus #8

Open
liadmagen opened this issue Oct 13, 2018 · 0 comments
Open

Train a basic LDA model using the NIPS corpus #8

liadmagen opened this issue Oct 13, 2018 · 0 comments
Labels
good first issue Good for newcomers hacktoberfest 🍁 https://hacktoberfest.digitalocean.com/

Comments

@liadmagen
Copy link
Member

Using the NIPS dataset's corpus, train a LDA model.

There are already implementations for LDA:
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
https://radimrehurek.com/gensim/models/ldamodel.html

Create scripts (src/papers/models/) exposing a function that using the packages, to train a model for the given corpus (as a parameter).
Expose a function for extracting topics for new, unseen, documents.

Create a notebook for the process - loading the NIPS corpus and calling the train and predict functions. Remember to divide the dataset before training, and testing the prediction part on the unseen documents.

The notebook should print the extracted topics for the preprocessed documents, compared to the non processed ones.

@liadmagen liadmagen added good first issue Good for newcomers hacktoberfest 🍁 https://hacktoberfest.digitalocean.com/ labels Oct 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers hacktoberfest 🍁 https://hacktoberfest.digitalocean.com/
Projects
None yet
Development

No branches or pull requests

1 participant