Analysis and Visualisation of Yelp Dataset using Apache Spark | Elastic Search | Kibana
Most businesses seek to get reviews on their goods and services one way or another. It is a most basic way for the business to improve their efficiency and subsequently their bottom-line. Get the review is not only the issue, ability to extract and visualize analytics from review data is critical to business success.
In Apache Spark Project, we will use the yelp review dataset to analyze businesses and reviews over a period of time. Perhaps we will spot potential gaps in service delivery or see how business thrive in different scenarios.
Beyond processing this data, we will ingest the final output of our data processing in Elasticsearch and use the visualization tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.
The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.
• Import data from yelp dataset into relational database(MySQL)
please go through file(get_yelp_in_mysql_databaset.txt) for more information
• Ingesting data from relational database (MySQL) using Sqoop into Hadoop HDFS
please go through file(yelp_mysql_sqoop_commands.txt) for more information
• Ingesting data from relational database directly into Spark
• Processing relational data in Spark
• Ingesting processed data into Elasticsearch
• Visualizing review analytics using Kibana
Area | Technology |
---|---|
DataSet | Yelp |
Relational Database | MySQL |
Big Data Ingestion Tool | Hadoop (Sqoop) |
Distributed File System | Hadoop (HDFS) |
Cluster Computing Framework | Apache Spark (Scala) |
Search and Analytics Engine | Elasticsearch |
Out of all attributes we will focus on some shown below
Business
category
hours
Review
Top 10 Business Categories
Yelp Business Map
Business distribution by state
Average rating of business over time
Top rated businesses
User sign up trend
Installation of Cloudera quickstart VM
Installation of Elk stack
Later Configuring Scala Runtime to Cloudera QuickStart VM
Watch the below video for more information
https://www.youtube.com/watch?v=SFJsuo2XISs
Launch the Spark Shell
spark-shell --packages org.elasticsearch:elasticsearch-spark-13_2.10:6.1.1 --conf spark.es.index.auto.create=true --conf spark.es.nodes= Ipaddress:port(Elastic search)
After Ingesting processed data into Elasticsearch
Yelp User sign up trend
Business distribution by state
Yelp review
Top 10 Business Categories
Dashboard