GitHub - killsaw/scrapt: A PHP library for making web scraping a bit easier.

#Scrapt A PHP library for making webscraping a bit easier.

##Example Scraper

$page = Scrapt::get('https://somepbxcompany.com/');
if ($page->contains('Please log in to access this page')) {
	$form = $page->getForm();
	$form->uemail = PBX_PORTAL_USERNAME; // is url encoded
	$form->pwd=PBX_PORTAL_PASSWORD;
	$page = Scrapt::submit($form);
}
$report_url = 'https://somepbxcompany.com/report.php';
$report_vars = array(
	'date'=>'2011-02-01'
);
$page = Scrapt::get($report_url, $report_vars);

Design Goals

By default, look like a human browser.
Support Mechanize-style page access.
Manage page caches intelligently. Prevent server clobbering.

Future Ideas

On the Mac, integrate phpOSA to remote Fake.app.
On other platforms, remote Selenium
Support proxy banks.
Support multi-node scraping, with coordination and job distribution.
Allow a scraping session to be paused without having to start over.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
build.properties		build.properties
build.xml		build.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Design Goals

Future Ideas

About

Releases

Packages

Languages

killsaw/scrapt

Folders and files

Latest commit

History

Repository files navigation

Design Goals

Future Ideas

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages