Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Latest commit

 

History

History
42 lines (33 loc) · 1.96 KB

README.md

File metadata and controls

42 lines (33 loc) · 1.96 KB

PFE Rapids

End of Studies projects at CentraleSupelec. The goal is to discover and test the Rapids ecosystem, focusing on the librairy cuML.

Installation

Follow the following steps:

  • Install Conda
  • Create working environment for Rapids using the following explainer: link. It will install all the require depencie to run cuml.
  • Activate the environment conda activate env_name.
  • Install the others depencies the will be listed in for each sub-project.

Data

You can download the data used using the following links::

Hello Scripts

Script to benchmark different clustering algorithms. Code adapted from here.

Test cuml kNN on real data. Code adapted from here

A script showing to most basic use of cuML Kmeans implementation.

Test of a integration of cuML in a Flask API.
Run the app with: python app.py
You can pretrained the model with the model_maker.py script, make sure to properly set the dataset path.

Urls Classification

Set the rigth dataset path and launch the script using python trainer_standalone.py

Full Stream

  • Set the right dataset path in this script.
  • Be sure to have Kafka running you can follow this
  • Launch the mock producer python src/mock_producer/main.py
  • Launch the trainer python src/trainer/main.py
  • Launch the metric collector python src/metrics_garbage/main.py

Metrics Analysis

Notebooks to plot metrics analysis.