nlpbumblebee

useful nlp functions / pipelines / transformers for text mining in a package format

Getting Started

TODO: add when ready

structure:

Prerequisites

This package is compatible with Linux/OSX systems

What package prerequsites you need to install the software ?

See requirements.txt

How to install those prerequisite packages if you need to?

Most prerequisite packages will be installed automaticaly.

However the spellcheck submodule uses the enchant system library.If this library is not there then pyenchant will not work.

On an Ubuntu / Debian system that means that you should run on your bash shell: sudo apt-get install enchant

On a RedHat / CentOS / Cloudera CDH system that means you should run on your bash shell: sudo yum install enchant

On OSX you need to run brew install enchant

After that the fastest way to make sure you have everything would be go to main directory and run

pip install -r requirements.txt 
python -m nltk.downloader vader_lexicon stopwords wordnet brown_tei gutenberg punkt popular

after downloading these packages and their assorted material (in the case of NLTK) everything should run smoothly. If not please open an issue here on this repo.

Installing

TODO: add when ready.. This is still in dev testing / alpha stage. Plan is after code review to upload to PYPI. (So at some point its going to be pip install nlpbumblebee or something similar... )

For the time being git clone the repo and enter:

python setup.py install

Running the tests

((you will need pytest intalled. if you dont have it the just : pip install pytest pytest-cov )

In order to run all the automated tests for this system after you have cloned it into your system just do:

cd tests

pytest -v   ## run all tests

cd ..

pytest -v --cov=nlpfunctions tests/      ## run tests and calculate testing coverage

Continuous Integration

Continuous Integration is a software development practice where members of a team integrate their work on a main repo frequently. Usually each person integrates their work at least daily leading to multiple integrations per day. Each integration is verified by an automated build (that includes running an automated test harness) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.

We are using Travis CI for this process.

Documentation

Curently there is automaticaly created documentation supported by sphinx-docs. This documentation is also availiable Additionaly there is a \examples folder where simple tasks using functions from this package are described.

Built With

NLTK - Natural Language ToolKit
scikit-learn - machine learning framework
pytest - unit testing framework
black - code formatter
sphinx-doc - documentation framework
Travis CI - Continuous Integration framework

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Authors / Maintainers

Theodore Manassis - mamonu
Alessia Tosi - exfalsoquodlibet

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

In opensource everyone is standing on the shoulders of giants... 
or possibly a really tall stack of ordinary-height people

The authors would like to thank in no particular order:

the ONS Big Data team (check their repos here )
the NLTK maintainers
the scikit-learn maintainers
Benjamin Bengfort, Tony Ojeda, Rebecca Bilbro . The authors of one of the most useful NLP books out there:
Applied Text Analysis with Python

references

Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, John, ed. "Latent Dirichlet Allocation". Journal of Machine Learning Research. 3 (4–5): pp. 993–1022.
Blei, David (April 2012). "Probabilistic Topic Models". Communications of the ACM. 55 (4): 77–84.
Lee, Daniel D., and H. Sebastian Seung. "Learning the parts of objects by non-negative matrix factorization." Nature 401.6755 (1999): 788.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.".
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.
Li, W. (1992). Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on information theory, 38(6), 1842-1845.
Knuth, D. E., Morris, Jr, J. H., & Pratt, V. R. (1977), 'Fast pattern matching in strings'. SIAM journal on computing, 6(2), 323-350.
Gilbert, C. H. E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14). Available at (08/10/18) http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
docs		docs
examples		examples
nlpbumblebee		nlpbumblebee
pics		pics
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENCE.md		LICENCE.md
README.md		README.md
requirements.txt		requirements.txt
runtests.sh		runtests.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nlpbumblebee

Getting Started

structure:

Prerequisites

What package prerequsites you need to install the software ?

How to install those prerequisite packages if you need to?

Installing

Running the tests

Continuous Integration

Documentation

Built With

Contributing

Authors / Maintainers

License

Acknowledgments

references

About

Releases

Packages

Contributors 3

Languages

License

mamonu/bumblebee

Folders and files

Latest commit

History

Repository files navigation

nlpbumblebee

Getting Started

structure:

Prerequisites

What package prerequsites you need to install the software ?

How to install those prerequisite packages if you need to?

Installing

Running the tests

Continuous Integration

Documentation

Built With

Contributing

Authors / Maintainers

License

Acknowledgments

references

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages