Skip to content

PyExtreme/fit_nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

fit_nlp

An End to End Pipeline for Text Modelling

The purpose of this repository is to implement all the necessary steps required for building a Text Classifier in Python. All these steps have been explained in the Jupyter Notebook(fit_nlp.ipynb).

The core idea of building a Text Classifier involves:

  • Data Analysis This step involved analyzing the entire data using techniques, cleaning it and transforming it to bring out valuable insights.

  • Feature Engineering Feature Engineering involves creating a representation of input data using which the model can be trained better. In this repository, I have used Bag of Words Approach to create feature vectors for the input data. Bag of Words is a representation used in Natural Language Processing in which we create a multiset of words for each training example.

The entire pipeline has been comprehensively explained in the IPython Notebook.

  • Model Selection, Training and Evaluation In this step, firstly a model is selected based on the properties of data and then it is trained on the training sample. In order to prevent underfitting and oerfitting, model is evaluation using evaluation metrics.

About

An End to End Pipeline for Text Modelling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published