Skip to content

emanueldumitru/RedditLiveParser

Repository files navigation

RedditLiveParser

RedditLiveParser is a small system that parses reddit submissions and comments using praw API, grabs them, formats them and then update from hour to hour a mongo db : reddit_items. It expose reddit data with filtering capabilities through an HTTP API.

Ex:

http://localhost:5000/items?subreddit=news&from=148837431520&to=1488482143&keyword=of

Optimization could be developed when we have a big period to fetch data, to split the process that fetch data and store in db in many processes.

Ex: current_date - start_point_date = 6months. Fetch 6 chunks of 1month.

Everything will be logged in files from log directory:

  • reddit-parser.log for reddit-parser.py, script that parse reddit data and save to mongo each hour
  • server.log for server.py, flask server that will listen for requests and will respond with json responses containing reddit submissions and comments based on the subreddit , time frames and optionally keyword given through request params.
  • test.log for test.py, script used for testing functionallity of the system

There will be two databases: reddit_items and temporarly reddit_items_test for unittesting

Config files will be read from config directory from files:

  • config.json or config-test.json for testing

python test.py for tests ... passed 7 out of 7 python reddit_parser.py python server.py

Dockerfile, requirements.txt and docker-compose.yml for Dockerfile

  • docker-compose build
  • docker-compose up

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages