Skip to content

gabledata/tutorial

Repository files navigation

Tutorial: Getting Started with Gable Platform

Welcome to the tutorial for Gable! In this tutorial, you will:

  • Set up a forked tutorial repository with Gable and Github Actions
  • Validate and publish a new data contract
  • Attempt to make a breaking change to a data asset under contract

This tutorial utilizes Github Actions to validate & publish data contracts, register data assets, and detect contract violations all using the Gable platform. If you would like to run a similar tutorial using just the CLI, you can check out the CLI tutorial.

Step 1: Set up a forked tutorial repository

The tutorial repository is a small, self contained sample repository that lets you run through the end-to-end process of publishing a data contract and seeing that contract be enforced with Gable's platform. In this tutorial, you'll create a fork of the repository, configure your Gable credentials, and run the CI/CD workflows in Github Actions. If you're using the credentials from your sandbox environment, you can even run through this tutorial by making a personal fork of this repository.

Fork this repository into your account

Navigate to the tutorial repository and click on the "Fork" button in the top-right corner of the repository's page.

Fork Tutorial Repo

A dialog may appear, asking you where to fork the repository (if you're part of any organizations). Choose your personal account (or the desired organization) to create the fork.

After clicking the create button, wait a moment as GitHub creates a copy of the repository under your account.

Allow Github to Run Workflow

Github is cautious when running workflows for forked repositories so you need to grant explicit permission to run the workflows for the tutorial. Navigate to your forked repository in Github and click on the Actions tab. You'll see a message like the following:

Allow Workflows

Click on the "I understand my workflows, go ahead and enable them" button. This will give Github permission to run the CI/CD checks that come bundled with the tutorial.

Get Your API Key

In order to connect the Github Actions to Gable, you need:

  • The API endpoint associated with your organization
  • An API key that corresponds to the endpoint

You can find your API endpoint and key by navigating to the Settings -> API Keys page of Gable. Under API Keys you can click View to reveal your API key.

Gable API Keys

Setting up GABLE_API_KEY and GABLE_API_ENDPOINT secrets

Next, you will need to create the GABLE_API_KEY and GABLE_API_ENDPOINT secrets in your repository settings to configure Github Actions to talk to Gable.

Follow the Github instructions for Creating encrypted secrets for a repository to create the GABLE_API_KEYand GABLE_API_ENDPOINT repository secrets.

Clone the forked tutorial repository

The last setup step is to clone the forked repository to your local machine. Navigate to the main page of the forked repository within your GitHub account and click the "Code" button and copy either the HTTPS or SSH url for the repository.

Now open a terminal and run:

git clone <COPIED_REPO_URL> --single-branch
cd <FORKED_REPO_NAME>

Trigger Data Asset Registration

For the tutorial, there are two types of data assets: event schemas and database tables. The event schemas are Protobuf and Avro files stored within the repository. The database tables are generated by migration files which apply changes to a test database in Github Actions.

In order to kick off the first workflow to register the data assets, you'll need to push an empty commit to the main branch of the repository. Run:

git commit --allow-empty -m "Run workflow"
git push

The workflow will start after you have created a Pull Request using the GitHub web interface. Once the workflow completes, you should be able to see the Orders contract in the Gable UI.

Congratulations! You've set up your tutorial repository and are ready to try out Gable's platform!

Step 2: Creating Your First Data Contract

The tutorial repository includes many different data assets for a generic e-commerce company. It includes database tables and event schemas. For this tutorial, we are going to create a contract on the order_details database table.

Create a new branch to add the contract

You are going to create a new data contract in the tutorial repository. Just like code changes, rather than creating the contract in the main branch of the repository, it's best practice to create a new branch, and open a Pull Request for the changes. The PR allows others to comment on your changes, and also allows the Github Actions to validate the contract is syntactically correct before pushing changes.

You can either create a new branch using the Github web interface or using the git command line tools. This tutorial will walk through creating the branch using the git command line. In the tutorial repository on your local machine:

  1. Open a Terminal Window: Navigate to the directory where your cloned tutorial repository is located

  2. Checkout a New Branch: Use the following command to create and switch to a new branch called first_contract:

    git checkout -b first_contract

Great! Now you can start writing the contract!

Write the data contract

You are going to create a data contract for the order_details table of the database, which represents the individual items included in a customer order. Writing a data contract involves creating a YAML file that declares the schema and semantics of the data following the data contract specification.

Contracts are associated with a specific data asset. When you first enabled Github Actions, it created several data assets for database tables and event schemas. You can navigate to the Data Assets page to view a list of your organization's assets.

Gable Data Assets

You can click on the data asset where you can view its details including the ID Gable uses for the asset. You can click on the clipboard next to the ID to copy it. For this tutorial, we are using the tutorial.public.order_details data asset.

Gable Data Asset Details

In the contracts directory of your local repository, create a file called order_details.yaml. NOTE: The CLI can upload both .yaml and .yml files but the tutorial is set up to only upload .yaml files. Please ensure that the file extension is .yaml for the purposes of this tutorial.

Copy and paste the following into the contents of order_details.yaml:

id: 7b7eabe6-a52a-4e19-8bf3-9015eadbbed0
dataAssetResourceName: postgres://prod.store.com:5432:tutorial.public.order_details
spec-version: 0.1.0
name: OrderDetails
namespace: Tutorial
doc: Details of items for an order. Each row represents the different items that were part of an order including their quantity and price.
owner: [email protected]
schema:
  - name: order_id
    doc: The identifier of the order the item is associated with
    type: int32
  - name: product_id
    doc: The identifier of the product that was ordered
    type: int32
  - name: quantity
    doc: The quantity of the product in the order
    type: int32
  - name: total_price
    doc: The total price of the item in the order
    type: decimal256
  - name: vendor_id
    doc: The identifier of the vendor that the product was ordered from
    type: int32

This contract contains information on what data the contract applies to, who owns the contract, as well as the minimum expected schema for the data from the order_details table.

Remark: You can generate a Data Contract template for a Data Asset using the Gable UI.

Genarate Contract from Data Asset

Push Your Changes to Github

Now that you have created the contract, it is time to commit the change to the repository. To stage and commit your changes, run:

git add .
git commit -m "Add order details data contract"
git push origin first_contract

Validate the data contract

Before merging your changes back to the main branch, it is a good idea to create a Pull Request. Creating the Pull Request will serve two purposes:

  • Allow others to review your newly-created data contract
  • Allow the Github Action to validate the data contract is syntactically correct

To create the Pull Request:

  1. Navigate to the main page of your forked tutorial repository in Github

  2. Click on the "Pull requests" tab near the top of the page

  3. Click the "New Pull Request" button

  4. In the "base repository" dropdown, select your forked tutorial repository

  5. In the "base" dropdown, select the main branch

  6. In the "compare" dropdown, select the first_contract branch that contains your new data contract

  7. In the "Title" field, add Add order details data contract

  8. In the "Leave a comment" field, add the following:

    Adds a new data contract for `order_details` table.
    
  9. Click the "Create Pull Request" button.

Once your PR is open, Gable's contract validation CI/CD check runs automatically to check for syntax errors in your data contracts. You can open the Validate Data Contracts check to see the status, if there were any errors in your contract they would appear here.

Validate Data Contracts Check Results

Publish the Data Contract

Merge the Pull Request you opened. This will kick off the Github Action which publishes the contract.

Once the Github Action on the main branch has been run and the data contract has been published, navigate to the Gable UI and you should see your new data contract!

Step 3: Preventing Breaking Data Changes

NOTE: This section relies on the order_details data contract created in the previous step. Please complete that step if you have not already.

Now that there is a data contract in place for the order_details data, every time a change affecting the underlying order_details table, Gable checks to ensure that changes do not violate the contract. Let's try making a breaking change to the order_details table.

Create a new branch

First off, pull changes down from the main branch to ensure that your repository is up-to-date. Create a new branch called breaking_data_change:

git checkout main
git pull
git checkout -b breaking_data_change

Make a Breaking Change

We are going to add a new migration file for our database to attempt to break the contract.

Create a new file called 7a341d712aa5_order_detail_changes.py in the ./db_migrations/versions/ folder. Paste the following contents into the new 7a341d712aa5_order_detail_changes.py file:

"""order_detail changes

Revision ID: 7a341d712aa5
Revises: 78da3a78bfd6
Create Date: 2023-09-29 13:22:35.204135

"""
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision: str = '7a341d712aa5'
down_revision: Union[str, None] = '78da3a78bfd6'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
    # Make the vendor optional
    op.alter_column(
        'order_details',  
        'vendor_id',      
        nullable=True,    
    )

    # Convert existing values to integers (for example, if representing cents)
    op.execute('UPDATE order_details SET unit_price = ROUND(unit_price * 100)')
    op.execute('UPDATE order_details SET total_price = ROUND(total_price * 100)') 
    
    # Alter the column type
    op.alter_column('order_details', 'unit_price', type_=sa.Integer)
    op.alter_column('order_details', 'total_price', type_=sa.Integer)


def downgrade() -> None:
    op.alter_column(
        'order_details',
        'vendor_id',
        nullable=False,
    )

    # Alter the column type back to Numeric
    op.alter_column('order_details', 'unit_price', type_=sa.Numeric(10, 2))
    op.alter_column('order_details', 'total_price', type_=sa.Numeric(10, 2))

    # Convert integer values back to their decimal representations
    op.execute('UPDATE order_details SET unit_price = total_price / 100.0')
    op.execute('UPDATE order_details SET total_price = total_price / 100.0')

This migration file does a couple of things. First, it converts the unit_price and total_price columns to an integer field and makes the vendor_id optional.

Both the total_price and vendor_id columns are protected in the contract. Changing the type of total_price and making vendor_id represent breaking changes.

Once the change has been saved, commit the change and push it to Github:

git add .
git commit -m "Update order details table"
git push origin breaking_data_change

Open a Pull Request

Now open a Pull Request for the proposed breaking change:

  1. Navigate to the main page of your forked tutorial repository in Github

  2. Click on the "Pull requests" tab near the top of the page

  3. Click the "New Pull Request" button

  4. In the "base repository" dropdown, select your forked tutorial repository

  5. In the "base" dropdown, select the main branch

  6. In the "compare" dropdown, select the breaking_data_change branch that contains the breaking data change

  7. In the "Title" field, add Update order details table

  8. In the "Leave a comment" field, add the following:

    Make updates to the order details table
    
  9. Click the "Create Pull Request" button.

View Warning Message

The Github Action will validate the changes against existing data contracts. It will detect the change to the order_details table and post a message in the PR that this change breaks the existing data contract.

Contract Violation PR Comment

Further Reading

Congratulations on creating your first data contract and validating data asset changes! Be sure to check out more of Gable's documentation for more information on our platform!

About

Quick tutorial for getting started with Gable

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published