GotchaTwitter

A python twitter crawler.

Support on crawling timeline for a target user [in a certain date range].
Developing Support on crawling threads.
Warning Using TwitterAPI to get user information with uid/screen_name is much faster and safer than web-scraping method.

Dependencies

bs4 Beautifulsoup
lxml Html parser for beautifulsoup (has special installation method on Amazon EC2)
tqdm Progress bar in terminal
requestsplus Self-modified requests package with max retries and sleeping time between requests
pushbullet.py (optional) Notifier when crawling is finished.

Example

input = [<screen_name_1>, <screen_name_2>]
access_token = <pushbullet_token>
output_fp = <filepath you want to save>
with GotchaTwitter('timeline', input, output_fp) as gt:
    gt = gt.set_output(save_mode='w', has_header=True) \
	    .set_notifier('pushbullet', access_token=access_token)
    gt.crawl()

Notifier Setting

PushBullet

Register a PushBullet account and create an access token in your account setting.
Download and install Pushbullet app on your device (iOS tested).

Install lxml on Amazon Linux AMI (2016.03.3)

sudo yum install libxml2-devel libxslt-devel python-devel gcc
sudo pip install --upgrade setuptools
sudo /usr/local/bin/easy_install lxml

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
gotchatwitter		gotchatwitter
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GotchaTwitter

Dependencies

Example

Notifier Setting

PushBullet

Install lxml on Amazon Linux AMI (2016.03.3)

About

Releases

Packages

Languages

PhantomKidding/GotchaTwitter

Folders and files

Latest commit

History

Repository files navigation

GotchaTwitter

Dependencies

Example

Notifier Setting

PushBullet

Install lxml on Amazon Linux AMI (2016.03.3)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages