Skip to content

Diana0422/SABD-project2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SABD-project2

This repository contains a code base that permits the execution of a real time analysis of environmental data recovered by Sensor.Community sensors. The main objective is to answer the following queries:

Query1: For those sensors having sensor_id < 10000, find the number of measurements and the temperature average value. Using a tumbling window, calculate this query:

  • every 1 hour (event time)
  • every 1 week (event time)
  • from the beginning of the dataset

Query2: Find the real-time top-5 ranking of locations (location) having the highest average temperature and the top-5 ranking of locations (location) having the lowest average temperature. Using a tumbling window, calculate this query:

  • every 1 hour (event time)
  • every 1 day (event time)
  • every 1 week (event time)

Query3: Consider the latitude and longitude coordinates within the geographic area which is identified from the latitude and longitude coordinates (38°, 2°) and (58°, 30°). Divide this area using a 4x4 grid and identify each grid cell from the top-left to bottom-right corners using the name "cell_X", where X is the cell id from 0 to 15. For each cell, find the average and the median temperature, taking into account the values emitted from the sensors which are located inside that cell. Using a tumbling window, calculate this query:

  • every 1 hour (event time)
  • every 1 day (event time)
  • every 1 week (event time

The code for the queries can be found in the codebase at these directories:

  • sabd2-flink/src/main/java/com/diagiac/flink/query1/Query1.java
  • sabd2-flink/src/main/java/com/diagiac/flink/query2/Query2.java
  • sabd2-flink/src/main/java/com/diagiac/flink/query3/Query3.java

We also implemented alternate of the query 1 using Kafka Streams and the implementation can be found at:

  • sabd2-kafka/src/main/java/com/diagiac/kafka/streams/Query1KafkaStreams.java

Requirements

This project uses Docker and DockerCompose to instantiate the Kafka, Zookeeper, Flink, Grafana, Prometheus, Redis and Publisher/Consumer containers.

Deployment

NOTE: First you need to compile the project using Maven. Open a terminal in the project base directory and execute the following command

mvn clean package

NOTE: Also some initialization ops are needed before the deploymento of the application, so execute this command in the project base directory

./scripts/init.sh

To deploy this project use DockerCompose:

docker compose up -d

Execute Query:

NOTE: Do this after the deployment phase. Also you need to wait for the Producer to begin its work. Open a terminal in the project base directory and follow these steps:

  • Bash:
./scripts/submit_job.sh <num-query> <parallelization-level>
  • Shell:
.\scripts\submit_job.cmd <num-query> <parallelization-level>

where num-query is the number of the query to execute:

  • 1 -> Query1
  • 2 -> Query2
  • 3 -> Query3
  • 4 -> Query1KafkaStreams

UIs:

You can find the Grafana Dashboards of the project at:

Frameworks:

  • Kafka: used for data ingestion
  • Flink: used for data stream processing
  • Prometheus: used as data storage for query metrics.
  • Redis: used to cache output data to be read by the data visualization layer.
  • Grafana: used to visualize output data after processing and performances from Prometheus.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages