Skip to content

cityu-hall2/Workshop2-Spider-on-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Workshop2-Spider-on-Python

This is a repository that build simple spider with python 2.7, containing a powerpoint slides and an ipython notebook.

A series of idea: request and response, header, exception handling,javascript rendering, regular expression, BeautifulSoup4 modules, which make up the basic framework of a crawler would be introduced in the ipython notebook

A basic knowledge in Python is needed and highly recommended to understand the content of this workshop. If you want to review on our previous workshop -- "Python Introduction", please go to "https://goo.gl/KxUzRf"

Requirements: Several modules would be needed to run this script. a. re b. selenium c. BeautifulSoup4 d. Jupyter notebook e. phantomjs f. requests

For mac/linux user to download those module, make sure you have brew installed "https://brew.sh/", If you have brew already installed, run the following two lines of command

  1. pip install re requests selenium BeautifulSoup4 Jupyter
  2. brew install phantomjs

(Optional) To maintain a clean environment, consider using a virtual environment. For more information please refer to "https://virtualenv.pypa.io/en/stable/"

For windows user, please use python installed with Anaconda and run the following commands in the Anaconda Prompt: conda install -c conda-forge re
conda install -c conda-forge requests
conda install -c conda-forge selenium
conda install -c conda-forge beautifulsoup4
conda install -c conda-forge phantomjs

About

This is a repository that build simple spider with python 2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published