Scalable web scraper bypassing bot-blockers without using paid proxy IPs. (Chrome runs headless, the demo shows chrome instance for demonstration purpose)
Sample data of crawling 5000+ URLs from 18 domains spread over 8 countries for 30 days
- Crawl data from both Desktop and Mobile as a channel
- Scale up the speed by increasing the number of chrome instances to run in parallel
- Deployable on virtual machines - EC2, Digital Ocean Droplet, Google Compute Engine etc
- Saves screenshots of webpages
- Support to translate the webpage to a particular language
- Support for providing a configuration file to parse data from the webpages