YelpDataSet-Analysis-Visualisation

Analysis and Visualisation of Yelp Dataset using Apache Spark | Elastic Search | Kibana

Introduction

Most businesses seek to get reviews on their goods and services one way or another. It is a most basic way for the business to improve their efficiency and subsequently their bottom-line. Get the review is not only the issue, ability to extract and visualize analytics from review data is critical to business success.

In Apache Spark Project, we will use the yelp review dataset to analyze businesses and reviews over a period of time. Perhaps we will spot potential gaps in service delivery or see how business thrive in different scenarios.

Beyond processing this data, we will ingest the final output of our data processing in Elasticsearch and use the visualization tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Goal

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Architecture

• Import data from yelp dataset into relational database(MySQL)

please go through file(get_yelp_in_mysql_databaset.txt) for more information

• Ingesting data from relational database (MySQL) using Sqoop into Hadoop HDFS

please go through file(yelp_mysql_sqoop_commands.txt) for more information

• Ingesting data from relational database directly into Spark

• Processing relational data in Spark

• Ingesting processed data into Elasticsearch

• Visualizing review analytics using Kibana

Technology stack

Area	Technology
DataSet	Yelp
Relational Database	MySQL
Big Data Ingestion Tool	Hadoop (Sqoop)
Distributed File System	Hadoop (HDFS)
Cluster Computing Framework	Apache Spark (Scala)
Search and Analytics Engine	Elasticsearch

Yelp schema

Out of all attributes we will focus on some shown below

Business

Use Cases considered for Visualisation

Top 10 Business Categories

Yelp Business Map

Business distribution by state

Average rating of business over time

Top rated businesses

User sign up trend

Configuring Environment

Installation of Cloudera quickstart VM

Installation of Elk stack

Later Configuring Scala Runtime to Cloudera QuickStart VM

Watch the below video for more information

https://www.youtube.com/watch?v=SFJsuo2XISs

Execution Instructions

Launch the Spark Shell

spark-shell --packages org.elasticsearch:elasticsearch-spark-13_2.10:6.1.1 --conf spark.es.index.auto.create=true --conf spark.es.nodes= Ipaddress:port(Elastic search)

Visualisation Screen shots

After Ingesting processed data into Elasticsearch

Yelp User sign up trend

Business distribution by state

Yelp review

Top 10 Business Categories

Dashboard

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
es_index_mapping		es_index_mapping
images		images
Dataset Link.txt		Dataset Link.txt
README.md		README.md
get_yelp_in_mysql_databaset.txt		get_yelp_in_mysql_databaset.txt
movielens_es_spark.txt		movielens_es_spark.txt
spark_es.scala		spark_es.scala
yelp_mysql_sqoop_commands.txt		yelp_mysql_sqoop_commands.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YelpDataSet-Analysis-Visualisation

Introduction

Goal

Architecture

Technology stack

Yelp schema

Use Cases considered for Visualisation

Configuring Environment

Execution Instructions

Visualisation Screen shots

About

Releases

Packages

Languages

RepakaRamateja/YelpDataSet-Analysis-Visualisation

Folders and files

Latest commit

History

Repository files navigation

YelpDataSet-Analysis-Visualisation

Introduction

Goal

Architecture

Technology stack

Yelp schema

Use Cases considered for Visualisation

Configuring Environment

Execution Instructions

Visualisation Screen shots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages