Skip to content

This repo contains examples to help people using big data technologies such as Spark, Cassandra and Elasticsearch.

License

Notifications You must be signed in to change notification settings

echauchot/bigDataRocks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big data rocks !

This repo contains examples to help people using big data technologies such as Spark, Cassandra and Elasticsearch.

Content

For now it contains a project that loads in parallel (using Spark) a collection of persons into Cassandra and Elasticsearch and provides also tools to manipulate them:

  • Pipeline and services to load data using local (in JVM) spark cluster
  • DAOs for CRUD operations against Cassandra and Elasticsearch
  • Unit tests that use embedded Cassandra and embedded Elasticsearch
  • Integration tests that run against official Cassandra and Elasticsearch docker containers (see https://github.com/echauchot/bigDataRocksCI for containers management)

The project is designed to work on a single machine. It contains no main and should be run using the integration tests provided.

Requirements

Next steps

  • Add more comprehensive Spark example with map/reduce operations and good practices (for instance, avoid groupby and prefer mapTopair + reduceByKey to avoid shuffle ...)
  • Add examples on how to use Spark SQL
  • Add examples on how to use Spark Streaming

About

This repo contains examples to help people using big data technologies such as Spark, Cassandra and Elasticsearch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages