ELT Pipeline with Snowflake, DBT, and Airflow

This project demonstrates the process of building an ELT pipeline from scratch using DBT, Snowflake, and Airflow. The pipeline extracts data from Snowflake's TPCH dataset, performs transformations using DBT, and orchestrates the workflow using Airflow. The primary focus is on data modeling, fact table creation, and business logic transformations.

Overview

ELT vs. ETL

Traditional ETL (Extract, Transform, Load) transforms data before loading it into a data warehouse. ELT, in contrast, loads raw data first, then applies transformations. Modern tools and cloud storage (like Snowflake) make ELT more efficient by leveraging cheaper storage.

Tools Used:

DBT: Data transformation
Snowflake: Data warehouse
Airflow: Workflow orchestration

Project Steps

1. Environment Setup

Set up a Snowflake account.

Create a warehouse, database, and role in Snowflake using the following SQL commands:

CREATE WAREHOUSE IF NOT EXISTS DBT_WAREHOUSE;
CREATE DATABASE IF NOT EXISTS DBT_DATABASE;
CREATE ROLE IF NOT EXISTS DBT_ROLE;

Grant privileges to the user and roles.

2. DBT Setup

Install DBT Core:
```
pip install dbt-core
```
Initialize the DBT project:
```
dbt init data_pipeline
```
Configure the Snowflake profile within DBT.

3. Data Modeling

Staging Tables: Pull data from Snowflake TPCH dataset and create views for orders and lineitems tables.
Fact Tables: Create fact tables using business logic and transformations:
- Example: Creating a surrogate key for dimensional modeling.
- Aggregating data into a fact table.

4. Macros

Defined reusable macros in DBT for business logic, such as calculating the discounted amount from extended prices.

5. Testing

Added DBT tests (both generic and singular) to ensure data integrity:
- Unique, non-null checks on primary keys.
- Valid range checks on dates and values.

6. Orchestration with Airflow

Installed Airflow and Astronomer Cosmos to orchestrate the DBT transformations.
Set up Airflow DAGs to trigger DBT runs:
- Scheduled DAGs to run daily.
- Configured the Snowflake connection within Airflow.

Project Structure

data_pipeline/ ├── models/ │ ├── staging/ │ └── marts/ ├── macros/ ├── tests/ ├── dags/ │ └── dbt_pipeline_dag.py └── Dockerfile

models/: Contains all the DBT models, including staging and fact tables.
macros/: Contains reusable business logic for transformations.
tests/: Includes DBT tests for data validation.
dags/: Contains the Airflow DAG for orchestrating the DBT runs.

How to Run

DBT Models:
```
dbt run
```
Airflow DAG:
- Start Airflow:
```
airflow webserver
airflow scheduler
```
- Trigger the dbt_pipeline_dag from the Airflow UI.

Future Improvements

Add more comprehensive data tests.
Integrate additional sources or incremental data loading.
Deploy to cloud services (AWS, GCP) for scalability.

Conclusion

This project demonstrates how to create an ELT pipeline using modern tools like DBT, Snowflake, and Airflow. It covers the complete workflow from environment setup to orchestration with Airflow. The result is a scalable and reusable ELT pipeline, ready for deployment in a production environment.

Referances

This project was build with a hand-on tutorial, with adaptation for Widnows and another changes needed, original video

License

This project is open-source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.astro		.astro
dags		dags
dbt_pipeline		dbt_pipeline
tests/dags		tests/dags
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
etl_pipeline.jpeg		etl_pipeline.jpeg
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ELT Pipeline with Snowflake, DBT, and Airflow

Overview

ELT vs. ETL

Tools Used:

Project Steps

1. Environment Setup

2. DBT Setup

3. Data Modeling

4. Macros

5. Testing

6. Orchestration with Airflow

Project Structure

How to Run

Future Improvements

Conclusion

Referances

License

About

Releases

Packages

Languages

License

aliabusaleh/ETL-Pipeline

Folders and files

Latest commit

History

Repository files navigation

ELT Pipeline with Snowflake, DBT, and Airflow

Overview

ELT vs. ETL

Tools Used:

Project Steps

1. Environment Setup

2. DBT Setup

3. Data Modeling

4. Macros

5. Testing

6. Orchestration with Airflow

Project Structure

How to Run

Future Improvements

Conclusion

Referances

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages