BigFlow package offers a command-line tool called bigflow
.
It lets you run, build, access logs, and deploy your workflows from command-line on any machine with Python.
BigFlow CLI is the recommended way of working with BigFlow projects on a local machine as well as for build and deployment automation on CI/CD servers.
Install the BigFlow PIP package in a fresh virtual environment in your project directory.
Test BigFlow CLI:
bigflow -h
You should see the welcome message and the list of all bigflow
commands:
...
Welcome to BigFlow CLI. Type: bigflow {command} -h to print detailed help for
a selected command.
positional arguments:
{run,deploy-dags,deploy-image,deploy,build-dags,build-image,build-package,build,project-version,pv,release,start-project,logs,build-requirements}
...
Each command has its own set of arguments. Check it with -h
, for example:
bigflow run -h
Complete sources for Cheat Sheet examples are available in this repository as a part of the Docs project.
git clone https://github.com/allegro/bigflow.git
cd bigflow/docs
If you have BigFlow installed — you can easily run them.
The bigflow run
command lets you run a job or a workflow
for a given runtime.
Typically, bigflow run
is used for local development because it's the simplest way to execute a workflow.
It's not recommended to be used on production, because:
- No deployment to Composer (Airflow) is done.
- The
bigflow run
process is executed on a local machine. If you kill or suspend it, what happens on GCP is undefined. - It uses local authentication so it relies on permissions of your Google account.
- It executes a job or workflow only once (while on production environment you probably want your workflows to be run periodically by Composer).
Getting help for the run command
bigflow run -h
Run the hello_world_workflow.py
workflow:
bigflow run --workflow hello_world_workflow
Run the single job:
bigflow run --job hello_world_workflow.say_goodbye
Run the workflow with concrete runtime
When running a workflow or a job with CLI, the runtime parameter
is defaulted to now. You can set it to concrete value using the --runtime
argument.
bigflow run --workflow hello_world_workflow --runtime '2020-08-01 10:00:00'
Run the workflow on selected environment
If you don't set the config
parameter,
the workflow configuration (environment)
is taken from the default
config (dev
in this case):
bigflow run --workflow hello_config_workflow
Select the concrete environment using the config
parameter.
bigflow run --workflow hello_config_workflow --config dev
bigflow run --workflow hello_config_workflow --config prod
There are five commands to build your deployment artifacts:
build-dags
generates Airflow DAG files from your workflows. DAG files are saved to a local.dags
dir.build-package
generates a PIP package from your project based onsetup.py
.build-image
generates a Docker image with this package and all requirements.build-requirements
compilesresources/requirements.in
intoresources/requirements.txt
(it's an optional step).build
simply runsbuild-dags
,build-package
,build-image
andbuild-requirements
.
Before using the build commands make sure that you have
a valid deployment_config.py
file.
It should define the docker_repository
parameter.
Getting help for build commands:
bigflow build-dags -h
bigflow build-package -h
bigflow build-image -h
bigflow build-requirements -h
bigflow build -h
The build-dags
command takes two optional parameters:
--start-time
— the first runtime of your workflows. If empty, a current hour (datetime.datetime.now().replace(minute=0, second=0, microsecond=0)
) is used.--workflow
— leave empty to build DAGs from all workflows. Set a workflow Id to build a selected workflow only.
Build DAG files for all workflows with default start-time
:
bigflow build-dags
Build the DAG file for the hello_config_workflow.py
workflow
with given start-time
:
bigflow build-dags --workflow hello_world_workflow --start-time '2020-08-01 10:00:00'
Building a PIP package
Call the build-package
command to build a PIP package from your project.
The command requires no parameters, all configuration is taken from setup.py
and deployment_config.py
(see Project structure and build).
Your PIP package is saved to a .tar.gz
file in the dist
dir.
bigflow build-package
Building a Docker image
The build-image
command builds
a Docker image with Python, your project's PIP package, and
all requirements. Next, the image is exported to a tar
file in the ./.image
dir.
bigflow build-image
Build requirements.txt
The build-requirements
command tries to resolve and freeze dependencies based on the resources/requirements.in
file.
You can learn more about that concept in the Project structure and build chapter.
bigflow build-requirements
Build a whole project with a single command
The build
command builds both artifacts (DAG files and a Docker image).
Internally, it executes the build-dags
, build-package
, and build-image
commands.
bigflow build
On this stage, you should have two deployment artifacts
created by the bigflow build
command.
There are three commands to deploy your workflows to Google Cloud Composer:
-
deploy-dags
uploads all DAG files from a.dags
folder to a Google Cloud Storage Bucket which underlies your Composer's DAGs Folder. -
deploy-image
pushes a docker image to Docker Registry which should be readable from your Composer's Kubernetes cluster. -
deploy
simply runs bothdeploy-dags
anddeploy-image
.
Important. By default, BigFlow takes deployment configuration parameters
from a ./deployment_config.py
file.
If you need more flexibility you can set these parameters explicitly via command line.
Getting help for deploy commands:
bigflow deploy-dags -h
bigflow deploy-image -h
bigflow deploy -h
Deploy DAG files
Upload DAG files from a .dags
dir to a dev
Composer using
local account authentication.
Configuration is taken from deployment_config.py
:
bigflow deploy-dags --config dev
Upload DAG files from a given dir using authentication with Vault. Configuration is passed via command line arguments:
bigflow deploy-dags \
--dags-dir '/tmp/my_dags' \
--auth-method=vault \
--vault-secret ***** \
--vault-endpoint 'https://example.com/vault' \
--dags-bucket europe-west1-12323a-bucket \
--gcp-project-id my_gcp_dev_project \
--clear-dags-folder
Deploy Docker image
Upload a Docker image imported from a .tar
file with the default path
(default path is: the first file from the .image
dir with a name with pattern .*-.*\.tar
).
Configuration is taken from deployment_config.py
.
Local account authentication is used:
bigflow deploy-image --config dev
Upload a Docker image imported from the .tar
file with the given path.
Configuration is passed via command line arguments.
Authentication with Vault is used:
bigflow deploy-image \
--image-tar-path '/tmp/image-0.1.0-tar' \
--docker-repository 'eu.gcr.io/my_gcp_dev_project/my_project' \
--auth-method=vault \
--vault-secret ***** \
--vault-endpoint 'https://example.com/vault'
Complete deploy examples
Upload DAG files from the .dags
dir and a Docker image from the default path.
Configuration is taken from deployment_config.py
.
Local account authentication is used:
bigflow deploy --config dev
The same, but the config (environment) name is defaulted to dev
:
bigflow deploy
Upload DAG files from the specified dir and the Docker image from the specified path. Configuration is passed via command line arguments. Authentication with Vault is used:
bigflow deploy \
--image-tar-path '/tmp/image-0.1.0-tar' \
--dags-dir '/tmp/my_dags' \
--docker-repository 'eu.gcr.io/my_gcp_dev_project/my_project' \
--auth-method=vault \
--vault-secret ***** \
--vault-endpoint 'https://example.com/vault' \
--dags-bucket europe-west1-12323a-bucket \
--gcp-project-id my_gcp_dev_project \
--clear-dags-folder
Deploy using deployment_config.py
from non-default path
By default, a deployment_config.py
file
is located in the main directory of your project, so bigflow
expects it exists under this path:
./deployment_config.py
.
You can change this location by setting the deployment-config-path
parameter:
bigflow deploy --deployment-config-path '/tmp/my_deployment_config.py'
The bigflow logs
command lets you generate a link leading to your project/workflow logs in GCP Logging. It will generate
link for every workflow that has logging configuration.
The output of bigflow logs
command consists of two parts, an infrastructure link, and a workflow link.
Workflow link contains logs from user code, Dataflow jobs, Dataproc jobs, and exceptions that may occur during executing the workflow.
The links will be created for every workflow found by Bigflow in the project directory.
The infrastructure link contains logs from Kubernetes pods/containers and Dataflow workers. The links will be created
for every unique project id found in workflows.
Use the bigflow start-project
command to create a sample project and try all of the above commands yourself.