At its core, Phantombuster allows you to script "web robots" in two languages: JavaScript and CoffeeScript.
To create a new bot script, log in, go to your scripts page and simply enter a name. Click Advanced to select what kind of script you want to create.
Each script can be launched on our platform by one of the following commands (in fact, binaries):
- Node — Execute your scripts in V8, Chrome's JavaScript runtime
- PhantomJS — Headless, scriptable WebKit browser (where Phantombuster got its name from)
- CasperJS — Framework built on top of PhantomJS to easily write complex navigation scenarios
You can also write your own modules to better control and optimize how your robots will navigate the web. Or use the one made by the Phantombuster team: :ref:`Nick`.
If you are in a hurry, please read at least the four following sections.
The best way to understand and get started quickly with Phantombuster is to try some sample scripts.
Once logged in (log in here if you haven't already), check out some of these scripts (choose your preferred language and framework or the first one if you don't know):
- Nick — Phantombuster's custom navigation module, easiest to understand and get started with:
- CoffeeScript sample: sample-Nick.coffee
- JavaScript sample: sample-Nick.js
- :ref:`API documentation <Nick>`
- CasperJS — popular choice but not everyone agrees with its step-based navigation system:
- CoffeeScript sample: sample-CasperJS.coffee
- JavaScript sample: sample-CasperJS.js
- API documentation: http://docs.casperjs.org/
- PhantomJS — provides the "low-level" functionality on which the two previous frameworks are based:
- CoffeeScript sample: sample-PhantomJS.coffee
- JavaScript sample: sample-PhantomJS.js
- API documentation: http://phantomjs.org/api/
- Node — by far the fastest of all but uses a completely different scripting API (and it's not a headless web browser):
- CoffeeScript sample: sample-Node.coffee
- JavaScript sample: sample-Node.js
- API documentation: https://nodejs.org/api/
When viewing a script, click Quick Launch in the top right corner to run it. You'll see your script execution in real-time. Just below the console output, your persistent storage is displayed (it's where your saved files will show up).
Do not hesitate to copy-paste these scripts to test more features.
Now that you launched your first few scripts, you probably noticed that they run within an agent (if you used the Quick Launch feature, the agent you used was named Quick Launch Agent).
Agents are configuration settings that describe how to run a certain script. They allow you to control how and when a script is launched. The combination of a script and an agent gives you a full featured "web robot" that can scrape and automate stuff on the web.
To create an agent, go to your agents page and enter a name of your choice. The most important settings of an agent are which script to launch and when to launch it. But you'll see there are a lot of other options...
Your scripts are executed in Linux containers (they are similar to very light virtual machines).
Available to you are a few gigabytes of RAM, a few gigabytes of hard disk space and a fast internet connection. These are temporary resources that are freed right after your agent finishes its job.
What's important to know is that files written on your agent's disk will be lost when it exits. To keep files, save them to your persistent storage using our :ref:`agent module <agent-module-file-storage>`.
- More technical details (for the nerds):
- The container engine is Docker
- Containers are running Debian
- Agents always start in
/home/phantom/agent
which is empty - Agents run under the user
phantom
All your scripts can easily be written right on our website, in the provided CoffeeScript/JavaScript web editor.
However, you might prefer using your own editor, locally on your machine. We made Phantombuster's SDK specifically for this.
The SDK will monitor a directory on your disk for changes in your scripts. As soon as a change is detected, the script will be uploaded in your Phantombuster account.
First, you need to have npm
installed. Then do this:
# npm install -g phantombuster-sdk
It will globally install the phantombuster
command. :ref:`Discover how to use it → <SDK>`
All your scripts (and samples/libraries) can be required. The requiring script must have a phantombuster dependencies
directive (similar to "use strict";
) listing its dependencies.
"use strict"; "phantombuster command: casperjs"; "phantombuster package: 2"; // Comma separated list of dependencies // Specify the full name (with extension) "phantombuster dependencies: lib-Foo.js, lib-Nick-beta.coffee"; // The rest of your script... MyLib = require("lib-Foo"); Nick = require("lib-Nick-beta");
When the name of a script starts with lib
, its launch will be disabled. This allows you to safely write reusable modules that can later be required using phantombuster dependencies
and then require()
.
To create a new module, log in, go to your scripts page, select the reusable module tab and enter your module name.
// In script "lib-Foo.js" "use strict"; module.exports = { foo: function() { console.log("bar"); } }
// In script "my-script.js" "use strict"; "phantombuster command: casperjs"; "phantombuster package: 2"; "phantombuster dependencies: lib-Foo.js"; require("lib-Foo").foo(); // outputs "bar"
:ref:`There are a few more subtleties to consider when writing your own modules → <writing-modules>`
If you want to make sure a script is always launched with the same command, add a phantombuster command
directive (similar to "use strict";
).
// Possible values are: casperjs, phantomjs and node "phantombuster command: node"; "phantombuster package: 2"; "use strict"; // The rest of your script... needle = require("needle");