Skip to content

Project uses Apache Airflow to create workflow orchestration of a data pipeline.

License

Notifications You must be signed in to change notification settings

damiiete/Airflow_Ingestion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow_Ingestion

Introduction

This project uses Apache Airflow to create workflow orchestration of a data pipeline that extracts, changes the format and uploads the data to a data lake (GCS).

Process

  • The airflow is installed on using the official arflow docker image.
  • DAGs are created in a python file.
  • The DAGs are triggered from the airflow webserver UI.

Download and Transformation of Data

The NYC taxi trip data was used for this project.

  • Downloaded the data using curl.
  • Data was transformed using the pandas library from csv to parquet (compressed format easier for querying).

Uploading Data

  • The data was uplaoded using google.cloud python library

About

Project uses Apache Airflow to create workflow orchestration of a data pipeline.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published