Chicago Taxi Trips Streaming Analysis with Real Time Dashboard

In this project, we have analyzed the infamous Chicago City Taxi Trips dataset using Spark Structured Streaming, and also built a dashboard for reporting the metrics using Flask and HTML. The dataset itself is open for public use and holds information since 2013, amounting to about 200 Million rows with each record being around 1024 bytes in size.

Features

Here are some features of this implementation:

Retrieves data from a folder and process them in a streaming fashion.
A full-fledged dashboard built using Flask and Chart JS to visualize the metrics.
Uses Spark to support big data analysis, tested on more than 100 million records.
Uses Structured Streaming and therefore can easily be ported for any similar use case.

Problem Statements

Rate of tipping over years?
Popular taxi trips days?
Mode of payment over years?
Total miles travelled across years?
Total time travelled across years?
Which company makes the most trips per year?

Implementation

Folder Structure

├── Documents                                   # Holds info about the project
├── dashboard                                   # Code for the Flask application
│   ├── static                                      # Holds static files associated with the dashboard
│   │   ├── css                                         # Holds styling files for the dashboard
│   │   └── js                                          # Holds javascript files used for the dashboard
│   ├── templates                                   # Holds the HTML template used for the dashboard UI
│   └── app.py                                      # Flask application which defines and triggers the endpoints; startpoint for the dashboard
├── source                                      # Source folder for the streaming application
│   └── 1.csv                                       # A sample source file
├── README.md                                   # Read this first
└── streaming.py                                # Streaming application which reads the files and executes operations in PySpark using Structured Streaming

Output

A full demo including code walkthrough and live demo can be found here (redirects to YouTube).

Dashboard.Video.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chicago Taxi Trips Streaming Analysis with Real Time Dashboard

Features

Problem Statements

Implementation

Folder Structure

Output

Contributors

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Documents		Documents
dashboard		dashboard
source		source
README.md		README.md
streaming.py		streaming.py

jacobceles/ChicagoTaxiTrips-SparkStreaming-RealTimeDashboard

Folders and files

Latest commit

History

Repository files navigation

Chicago Taxi Trips Streaming Analysis with Real Time Dashboard

Features

Problem Statements

Implementation

Folder Structure

Output

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Languages