Skip to content

Latest commit

 

History

History
17 lines (13 loc) · 926 Bytes

README.md

File metadata and controls

17 lines (13 loc) · 926 Bytes

Undetected Chrome Scraper!

Scalable web scraper bypassing bot-blockers without using paid proxy IPs. (Chrome runs headless, the demo shows chrome instance for demonstration purpose)

Sample data of crawling 5000+ URLs from 18 domains spread over 8 countries for 30 days

Features

  • Crawl data from both Desktop and Mobile as a channel
  • Scale up the speed by increasing the number of chrome instances to run in parallel
  • Deployable on virtual machines - EC2, Digital Ocean Droplet, Google Compute Engine etc
  • Saves screenshots of webpages
  • Support to translate the webpage to a particular language
  • Support for providing a configuration file to parse data from the webpages