Skip to content

Latest commit

 

History

History
executable file
·
229 lines (139 loc) · 13.3 KB

Readme.md

File metadata and controls

executable file
·
229 lines (139 loc) · 13.3 KB

Model Deployment

We will visit different steps involved in MLOps pipeline.

Machine Learning Model Operationalization Management - MLOps, as a DevOps extension, establishes effective practices and processes around designing, building, and deploying ML models into production.

In paper titled Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology, the authors introduce a methodology or a process model for development of ML applications called CRoss-Industry Standard Process model for the development of Machine Learning applications with Quality assurance methodology (CRISP-ML(Q)). CRISP-ML(Q) offers ML community a standard process to streamline ML and data science projects making results reproducible. It is designed for development of ML applications where ML model is deployed and maintained as part of product or service.

CRISP-ML(Q) model
Source

CRISP-ML(Q) process model consits of 6 phases:

  1. Business & Data Understanding
  2. Data Preparation
  3. Modelling
  4. Evaluation
  5. Deployment
  6. Monitoring and Maintenance

For each phase, the flow chart below explains quality assurance approach in CRISP-ML(Q). In the first step, clear objective for the current phase are defined, followed by taking steps to initiate the task, followed by identifying the risks that might negatively impact the efficiency and success of the ML application (e.g., bias, overfitting, lack of reproducibility, etc.), quality assurance methods to mitigate risks when these risks need to be diminished (e.g., cross-validation, documenting process and results, etc.).

CRISP-ML(Q) approach for quality assurance for each of the six phases
Source

Model Deployment

The ML model deployment includes following tasks :

  • Define inference hardware and optimize ML model for target hardware
  • Evaluate model under production condition
  • Assure user acceptance and usability
  • Minimize the risks of unforseen errors
  • Deployment strategy

A wise person on the Internet once said: deploying is easy if you ignore all the hard parts. If you want to deploy a model for your friends to play with, all you have to do is to create an endpoint to your prediction function, push your model to AWS, create an app with Streamlit or Dash. The hard parts include making your model available to millions of users with a latency of milliseconds and 99% uptime, setting up the infrastructure so that the right person can be immediately notified when something went wrong, figuring out what went wrong, and seamlessly deploying the updates to fix what’s wrong. Source by Chip Huyen

Model Serving and Deployment Patterns

Source: https://ml-ops.org/content/three-levels-of-ml-software

Model serving is a way to integrate the ML model in a software system. There are two aspects for deploying ML system in a production environment. First deploying pipeline for automated retraining and second providing endpoint to ingest input data and provide predictions using ML model.

There are 5 popular model serving patterns to put ML model into production

  1. Model-as-Service

Model-as-Service
Source

  1. Model-as-Dependency

Model-as-Dependency
Source

  1. Precompute

Precompute-Serving
Source

  1. Model-on-Demand

Model-on-Demand
Source

  1. Hybrid-Serving

Federated Learning
Source

There are 2 popular deployment strategies

  1. Deploying ML models as Docker Containers

Docker Containers to Cloud Instances
Source

  1. Deploying ML Models as Serverless Functions

Serverless Functions
Source

Recommended Readings

Model Serving

In this project, our focus will be on different approaches we can serve ML model. MLOps.toys provides a comprehensive survey of different frameworks that exists for Model Serving. The focus of this project would be to explore all 10+ frameworks and many more along with cloud services for serving and testing the endpoint of deployed ML model.

We will start with simple exercise of how to make use of Github Actions for CI/CD. As we go down, we will integrate various technologies such as Github Actions, Docker, PyTest, Linting while testing different ML model serving frameworks visiting best practices.

  1. Makefile : In this exercise, we will automate the task of installing packages, linting, formatting and testing using Makefile.

    Technologies : Pytest, Make

  2. Github Actions Makefile: In this exercise, we will automate the task of installing packages, linting, formatting and testing using github actions.

    Technologies: Pytest, Make, Github Actions

  3. Github Actions Docker: In the exercise, we will implement the following:

    • Containerize a GitHub project by integrating a Dockerfile and automatically registering new containers to a Container Registry.

    • Create a simple load test for your application using a load test framework such as locust or loader io and automatically run this test when you push changes to a staging branch

    Technologies: Docker, Github Actions, Locust

  4. FastAPI Azure: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on Azure using Azure App Services and Azure DevOps Pipelines.

    Technologies: Docker, FastAPI, Continuous Delivery using Azure App Services, Azure DevOps Pipelines

  5. FastAPI GCP: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on GCP using Cloud Run and Cloud Build.

    Technologies: Docker, FastAPI, Continuous Delivery using GCP Cloud Run and Cloud Build

  6. FastAPI AWS: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on AWS using AWS using Elastic Beanstalk and Code Pipeline.

    Technologies: Docker, FastAPI, sklearn, Continuous Delivery using Elastic Beanstalk and Code Pipeline

  7. AWS Terraform Deploy: To be implemented

  8. FastAPI GKE: In this project, we will deploy a sentiment analyser model using fastapi on GCP using GKE.

    • Containerizing different components of projects

    • Writing tests and testing individual modules using pytest

    • Using trunk for automatic code checking, formatting and liniting

    • Deploying application on GKE

    Technologies: Docker, FastAPI, HuggingFace Transformer model, Pytest, Trunk, GKE

  9. FastAPI Kubernetes Monitoring: In this exercise, we will introduce Kubernetes. Using Kubernetes deploy fastapi application and monitor this application using Prometheus and Grafana, following best practices of writing tests and trigger a CI workflow using github actions.

    Technologies: Docker, Docker-compose, Pytest, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Kubernetes, Prometheus, Grafana

  10. BentoML Deploy: In this exercise, we will use BentoML library to deploy the sentiment classification model from Hugging Face 🤗 on following services.

    Technologies: Docker, Pytest, FastAPI, HuggingFace Transformer model, AWS Lambda, Azure Functions, Kubernetes, BentoML

  11. Cortex Deploy: In this exercise, transformers sentiment classifier fastapi application is deployed using Cortex two different APIs.

    Technologies: Docker, Cortex, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Trunk.io linter

  12. Serverless Deploy: In this exercise, hugging face transformers sentiment classifier FastAPI application is deployed using Serverless Framework.

    Technologies: Docker, Serverless Framework, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Trunk.io linter

  13. Bodywork Train and Deploy: This exercise contains a Bodywork project that demonstrates how to run a ML pipeline on Kubernetes, with Bodywork. The example ML pipeline has two stages:

    • Run a batch job to train a model.

    • Deploy the trained model as service with a REST API.

    Technologies: Bodywork, Sklearn, Flask, Kubernetes, Cronjob

  14. KServe Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model. Since MLServer does not provide out-of-the-box support for PyTorch or Transformer models, we will write a custom inference runtime to deploy this model and test the endpoints.

    Technologies: Docker, KServe, HuggingFace Transformer model, Pytest, Kubernetes, Istio, Knative, Kind, TorchServe

    TorchServe: Deploying hugging face transformer model using torchserve.

  15. MLServer Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model. Since MLServer does not provide out-of-the-box support for PyTorch or Transformer models, we will write a custom inference runtime to deploy this model and test the endpoints.

    Technologies: Docker, MLServer, HuggingFace Transformer model

  16. Ray Serve Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model using Ray Serve so it can be scaled up and queried over HTTP using two approaches.

    • Ray Serve default approach

    • Ray Serve with FastAPI

    Technologies: Docker, Ray Serve, FastAPI, HuggingFace Transformer model

  17. Seldon core Deploy: In this exercise, we will deploy a simple sklearn iris model using Seldon Core. We will deploy using two approaches and test the endpoints .

    • Seldon core default approach

    • V2 Inference protocol

    Technologies: Docker, Seldon Core, Sklearn model, Kubernetes, Istio, Helm, Kind

  18. Nvidia Triton Deploy: Coming Soon