diff --git a/docs/assets/images/cml_report1_dark.png b/docs/assets/images/github_cml_report_1_dark.png similarity index 100% rename from docs/assets/images/cml_report1_dark.png rename to docs/assets/images/github_cml_report_1_dark.png diff --git a/docs/assets/images/cml_report1_light.png b/docs/assets/images/github_cml_report_1_light.png similarity index 100% rename from docs/assets/images/cml_report1_light.png rename to docs/assets/images/github_cml_report_1_light.png diff --git a/docs/assets/images/cml_report2_dark.png b/docs/assets/images/github_cml_report_2_dark.png similarity index 100% rename from docs/assets/images/cml_report2_dark.png rename to docs/assets/images/github_cml_report_2_dark.png diff --git a/docs/assets/images/cml_report2_light.png b/docs/assets/images/github_cml_report_2_light.png similarity index 100% rename from docs/assets/images/cml_report2_light.png rename to docs/assets/images/github_cml_report_2_light.png diff --git a/docs/assets/images/gitlab_cml_report_1_dark.png b/docs/assets/images/gitlab_cml_report_1_dark.png new file mode 100644 index 00000000..3edf0e13 Binary files /dev/null and b/docs/assets/images/gitlab_cml_report_1_dark.png differ diff --git a/docs/assets/images/gitlab_cml_report_1_light.png b/docs/assets/images/gitlab_cml_report_1_light.png new file mode 100644 index 00000000..5c334995 Binary files /dev/null and b/docs/assets/images/gitlab_cml_report_1_light.png differ diff --git a/docs/assets/images/gitlab_cml_report_2_dark.png b/docs/assets/images/gitlab_cml_report_2_dark.png new file mode 100644 index 00000000..185ff350 Binary files /dev/null and b/docs/assets/images/gitlab_cml_report_2_dark.png differ diff --git a/docs/assets/images/gitlab_cml_report_2_light.png b/docs/assets/images/gitlab_cml_report_2_light.png new file mode 100644 index 00000000..ab7cb646 Binary files /dev/null and b/docs/assets/images/gitlab_cml_report_2_light.png differ diff --git a/docs/clean-up.md b/docs/clean-up.md index a8ceae8d..05e9193d 100644 --- a/docs/clean-up.md +++ b/docs/clean-up.md @@ -33,6 +33,20 @@ provider. gcloud container clusters delete --zone $GCP_CLUSTER_ZONE $GCP_CLUSTER_NAME ``` + Press ++Y++ to confirm the deletion. + + **Delete the Google Artifact Registry** + + To delete the Google Artifact Registry used to store the Docker images you + created you can execute the following command: + + ```sh title="Execute the following command(s) in a terminal" + # Delete the Kubernetes cluster + gcloud artifacts repositories delete --location $GCP_REPOSITORY_LOCATION $GCP_REPOSITORY_NAME + ``` + + Press ++Y++ to confirm the deletion. + **Delete the Google Storage bucket** !!! warning @@ -49,14 +63,7 @@ provider. gcloud storage rm --recursive gs://$GCP_BUCKET_NAME ``` - Alternatively, you can delete the bucket from the Google Cloud Console: - - 1. Go to the - [Google Cloud Storage Console](https://console.cloud.google.com/storage){:target="\_blank"}. - 2. Make sure you selected the correct project. - 3. Select the bucket you want to delete from the bucket list. - 4. Click on **Delete** at the top of the page. - 5. Follow the instructions to delete the bucket. + Press ++Y++ to confirm the deletion. **Delete the Service Account** @@ -67,23 +74,17 @@ provider. gcloud iam service-accounts delete dvc-service-account@${GCP_PROJECT_ID}.iam.gserviceaccount.com ``` - Alternatively, you can delete the service account from the Google Cloud Console: - - 1. Go to the - [Google Cloud IAM Console](https://console.cloud.google.com/iam-admin/serviceaccounts){:target="\_blank"}. - 2. Make sure you selected the correct project. - 3. Select the service account you want to delete from the service account list. - 4. Click on **Delete** at the top of the page. - 5. Follow the instructions to delete the service account. + Press ++Y++ to confirm the deletion. - **Delete the local Service Account key** + **Delete the local Service Account keys** - You can run the following command to delete the service account key you created + You can run the following command to delete the service account keys you created locally: ```sh title="Execute the following command(s) in a terminal" # Delete the local Service Account key rm ~/.config/gcloud/dvc-google-service-account-key.json + rm ~/.config/gcloud/mlem-google-service-account-key.json ``` **Delete the Google Cloud project** @@ -98,7 +99,7 @@ provider. **Close the Billing Account** - To remove the project from the Billing Acocunt: + To remove the project from the Billing Account: 1. Go to the [Google Cloud Billing Console](https://console.cloud.google.com/billing){:target="\_blank"}. @@ -220,10 +221,11 @@ Here is a checklist of all the resources and environments you created. You can click on the list items to mark them as completed if needed. - [ ] The cloud provider Kubernetes cluster +- [ ] The cloud provider container registry - [ ] The cloud provider S3 bucket - [ ] The cloud provider credentials - [ ] The cloud provider project -- [ ] The GitHub or GitLab Personal Access Token +- [ ] The GitHub or GitLab Personal Access Tokens - [ ] The GitHub or GitLab repository - [ ] The projects directories - [ ] The `a-guide-to-mlops-jupyter-notebook` directory diff --git a/docs/part-1-local-training-and-model-evaluation/chapter-1-run-a-simple-ml-experiment-with-jupyter-notebook.md b/docs/part-1-local-training-and-model-evaluation/chapter-1-run-a-simple-ml-experiment-with-jupyter-notebook.md index 697d26b0..696ce65b 100644 --- a/docs/part-1-local-training-and-model-evaluation/chapter-1-run-a-simple-ml-experiment-with-jupyter-notebook.md +++ b/docs/part-1-local-training-and-model-evaluation/chapter-1-run-a-simple-ml-experiment-with-jupyter-notebook.md @@ -158,11 +158,12 @@ Launch the notebook. jupyter-lab notebook.ipynb ``` -A browser window should open with the Jupyter Notebook. +A browser window should open with the Jupyter Notebook at +. -You may notice all the previous outputs from the notebook are still present. -This is because the notebook was not cleared before being shared with you. This -can be useful to see the results of previous runs. +You may notice all the previous outputs from the notebook might still be +present. This is because the notebook was not cleared before being shared with +you. This can be useful to see the results of previous runs. In most cases, however, it can also be a source of confusion and clutter. This is one of the limitations of the Jupyter Notebook, which make them not always diff --git a/docs/part-1-local-training-and-model-evaluation/chapter-2-adapt-and-move-the-jupyter-notebook-to-python-scripts.md b/docs/part-1-local-training-and-model-evaluation/chapter-2-adapt-and-move-the-jupyter-notebook-to-python-scripts.md index bf066c90..8f05de0c 100644 --- a/docs/part-1-local-training-and-model-evaluation/chapter-2-adapt-and-move-the-jupyter-notebook-to-python-scripts.md +++ b/docs/part-1-local-training-and-model-evaluation/chapter-2-adapt-and-move-the-jupyter-notebook-to-python-scripts.md @@ -1,14 +1,5 @@ # Chapter 2: Adapt and move the Jupyter Notebook to Python scripts -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction Jupyter Notebooks provide an interactive environment where code can be executed diff --git a/docs/part-1-local-training-and-model-evaluation/chapter-3-initialize-git-and-dvc-for-local-training.md b/docs/part-1-local-training-and-model-evaluation/chapter-3-initialize-git-and-dvc-for-local-training.md index d25a62d3..8970add2 100644 --- a/docs/part-1-local-training-and-model-evaluation/chapter-3-initialize-git-and-dvc-for-local-training.md +++ b/docs/part-1-local-training-and-model-evaluation/chapter-3-initialize-git-and-dvc-for-local-training.md @@ -1,14 +1,5 @@ # Chapter 3: Initialize Git and DVC for local training -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction Now that you have a good understanding of the experiment, it's time to @@ -404,7 +395,8 @@ This chapter is done, you can check the summary. ## Summary -Congrats! You now have a dataset that can be used and shared among the team. +Congratulations! You now have a dataset that can be used and shared among the +team. In this chapter, you have successfully: diff --git a/docs/part-1-local-training-and-model-evaluation/chapter-4-reproduce-the-ml-experiment-with-dvc.md b/docs/part-1-local-training-and-model-evaluation/chapter-4-reproduce-the-ml-experiment-with-dvc.md index 27d06325..7a298cd8 100644 --- a/docs/part-1-local-training-and-model-evaluation/chapter-4-reproduce-the-ml-experiment-with-dvc.md +++ b/docs/part-1-local-training-and-model-evaluation/chapter-4-reproduce-the-ml-experiment-with-dvc.md @@ -1,14 +1,5 @@ # Chapter 4: Reproduce the ML experiment with DVC -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction A key component of [:simple-dvc: DVC](../tools.md) is the concept of "stages". @@ -26,9 +17,9 @@ In this chapter, you will learn how to: 1. Remove custom rules from the `.gitignore` file 2. Set up DVC pipeline stages: - - `prepare` - - `train` - - `evaluate` + - Prepare + - Train + - Evaluate 3. Visualize the pipeline 4. Execute the pipeline 5. Push the changes to DVC and Git @@ -379,15 +370,16 @@ This chapter is done, you can check the summary. ## Summary -Congrats! You have defined a pipeline and know how to reproduce your experiment. +Congratulations! You have defined a pipeline and know how to reproduce your +experiment. In this chapter, you have successfully: 1. Removed custom rules from the `.gitignore` file 2. Set up three DVC pipeline stages - - `prepare` - - `train` - - `evaluate` + - Prepare + - Train + - Evaluate 3. Visualized the pipeline 4. Executed the pipeline 5. Committed the changes diff --git a/docs/part-1-local-training-and-model-evaluation/chapter-5-track-model-evolution-with-dvc.md b/docs/part-1-local-training-and-model-evaluation/chapter-5-track-model-evolution-with-dvc.md index f96c316b..c94c0138 100644 --- a/docs/part-1-local-training-and-model-evaluation/chapter-5-track-model-evolution-with-dvc.md +++ b/docs/part-1-local-training-and-model-evaluation/chapter-5-track-model-evolution-with-dvc.md @@ -1,14 +1,5 @@ # Chapter 5: Track model evolution with DVC -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction In the previous chapter, you did set up a [:simple-dvc: DVC](../tools.md) @@ -74,8 +65,8 @@ index 5bb698e..6a6ff45 100644 output_classes: 11 ``` -Here, you simply changed the `epochs` parameter of the `train` stage, which -should slightly affect the model's performance. +Here, you simply changed the `epochs` parameter of the Train stage, which should +slightly affect the model's performance. ### Reproduce the experiment @@ -256,7 +247,7 @@ This chapter is done, you can check the summary. ## Summary -Congrats! You now have a simple way to compare the two iterations of your +Congratulations! You now have a simple way to compare the two iterations of your experiment. In this chapter, you have successfully: diff --git a/docs/part-2-move-the-model-to-the-cloud/chapter-10-work-efficiently-and-collaboratively-with-git.md b/docs/part-2-move-the-model-to-the-cloud/chapter-10-work-efficiently-and-collaboratively-with-git.md index 3fc9e2cb..9a57e542 100644 --- a/docs/part-2-move-the-model-to-the-cloud/chapter-10-work-efficiently-and-collaboratively-with-git.md +++ b/docs/part-2-move-the-model-to-the-cloud/chapter-10-work-efficiently-and-collaboratively-with-git.md @@ -1,14 +1,5 @@ # Chapter 10: Work efficiently and collaboratively with Git -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction The objective of this chapter is to work effectively and collaboratively on the @@ -233,9 +224,6 @@ repository. Check the changes with Git to ensure all wanted files are here. ```sh title="Execute the following command(s) in a terminal" -# Upload the experiment data and cache to the remote bucket -dvc push - # Add all the files git add . @@ -255,7 +243,12 @@ Changes to be committed: modified: params.yaml ``` +Push the changes to the remote repository. + ```sh title="Execute the following command(s) in a terminal" +# Upload the experiment data and cache to the remote bucket +dvc push + # Commit the changes git commit -m "I made some changes to the model" @@ -313,23 +306,35 @@ git push evaluation data that was pulled from DVC, it can uses it to display all the plots. +
+
+ ![Plots Diff 1 Light](../assets/images/github_cml_report_1_light.png#only-light){ loading=lazy } +
+ ![Plots Diff 1 Dark](../assets/images/github_cml_report_1_dark.png#only-dark){ loading=lazy } +
+ ![Plots Diff 2 Light](../assets/images/github_cml_report_2_light.png#only-light){ loading=lazy } +
+ ![Plots Diff 2 Dark](../assets/images/github_cml_report_2_dark.png#only-dark){ loading=lazy } +
+
+ === ":simple-gitlab: GitLab" When the CI/CD pipeline completes, a new comment is added to your merge request. Check the merge request and examine the report made by CML. As it uses the evaluation data that was pulled DVC, it can uses it to display all the plots. - -
-![Plots Diff 1 Light](../assets/images/cml_report1_light.png#only-light){ loading=lazy } -
-![Plots Diff 1 Dark](../assets/images/cml_report1_dark.png#only-dark){ loading=lazy } -
-![Plots Diff 2 Light](../assets/images/cml_report2_light.png#only-light){ loading=lazy } -
-![Plots Diff 2 Dark](../assets/images/cml_report2_dark.png#only-dark){ loading=lazy } -
+
+
+ ![Plots Diff 1 Light](../assets/images/gitlab_cml_report_1_light.png#only-light){ loading=lazy } +
+ ![Plots Diff 1 Dark](../assets/images/gitlab_cml_report_1_dark.png#only-dark){ loading=lazy } +
+ ![Plots Diff 2 Light](../assets/images/gitlab_cml_report_2_light.png#only-light){ loading=lazy } +
+ ![Plots Diff 2 Dark](../assets/images/gitlab_cml_report_2_dark.png#only-dark){ loading=lazy } +
+
### Merge the pull request/merge request @@ -346,7 +351,7 @@ git push repository. If you ever need to go back to this branch, you can always restore the branch from this menu. - Congrats! You can now iterate on your model while keeping a trace of the + Congratulations! You can now iterate on your model while keeping a trace of the improvements made to it. You can visualize and discuss the changes made to a model before merging them into the codebase. @@ -360,7 +365,7 @@ git push The associated issue will be automatically closed as well. - Congrats! You can now iterate on your model while keeping a trace of the + Congratulations! You can now iterate on your model while keeping a trace of the improvements made to it. You can visualize and discuss the changes made to a model before merging them into the codebase. diff --git a/docs/part-2-move-the-model-to-the-cloud/chapter-6-move-the-ml-experiment-code-to-the-cloud.md b/docs/part-2-move-the-model-to-the-cloud/chapter-6-move-the-ml-experiment-code-to-the-cloud.md index ef7349ff..3039a58d 100644 --- a/docs/part-2-move-the-model-to-the-cloud/chapter-6-move-the-ml-experiment-code-to-the-cloud.md +++ b/docs/part-2-move-the-model-to-the-cloud/chapter-6-move-the-ml-experiment-code-to-the-cloud.md @@ -1,14 +1,5 @@ # Chapter 6: Move the ML experiment code to the cloud -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction Now that you have configured [:simple-dvc: DVC](../tools.md) and can reproduce @@ -80,24 +71,24 @@ Create a Git repository on your preferred service to collaborate with peers. === ":simple-github: GitHub" - Create a new GitHub repository for this chapter by accessing - . - - !!! warning + !!! danger "Important" Configure the repository as you wish but **do not** check the boxex _"Add a README file"_, _"Add .gitignore"_ nor _"Choose a license"_. -=== ":simple-gitlab: GitLab" + Create a new GitHub repository for this chapter by accessing + . - Create a new GitLab blank project for this chapter by accessing - . +=== ":simple-gitlab: GitLab" - !!! warning + !!! danger "Important" Configure the repository as you wish but **do not** check the box _"Initialize repository with a README"_. + Create a new GitLab blank project for this chapter by accessing + . + ## Configure Git for the remote branch Add the remote origin to your repository. Replace `` @@ -131,7 +122,8 @@ points. ## Summary -Congrats! You now have a codebase that can be used and shared among the team. +Congratulations! You now have a codebase that can be used and shared among the +team. In this chapter, you have successfully: diff --git a/docs/part-2-move-the-model-to-the-cloud/chapter-7-move-the-ml-experiment-data-to-the-cloud.md b/docs/part-2-move-the-model-to-the-cloud/chapter-7-move-the-ml-experiment-data-to-the-cloud.md index 34e74e86..1ec045c0 100644 --- a/docs/part-2-move-the-model-to-the-cloud/chapter-7-move-the-ml-experiment-data-to-the-cloud.md +++ b/docs/part-2-move-the-model-to-the-cloud/chapter-7-move-the-ml-experiment-data-to-the-cloud.md @@ -1,14 +1,5 @@ # Chapter 7: Move the ML experiment data to the cloud -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction At this point, the codebase is made available to team members using @@ -107,7 +98,8 @@ Create a project on a cloud provider to host the data. Name your project and select **Create** to create the project. - A new page opens. Note the ID of your project, it will be used later. + A new page opens. Note the ID of your project (not the project number nor + name!), it will be used later. !!! warning @@ -422,7 +414,8 @@ hashed and have been uploaded. ## Summary -Congrats! You now have a dataset that can be used and shared among the team. +Congratulations! You now have a dataset that can be used and shared among the +team. In this chapter, you have successfully: diff --git a/docs/part-2-move-the-model-to-the-cloud/chapter-8-reproduce-the-ml-experiment-in-a-cicd-pipeline.md b/docs/part-2-move-the-model-to-the-cloud/chapter-8-reproduce-the-ml-experiment-in-a-cicd-pipeline.md index fdc9c5eb..cf46235b 100644 --- a/docs/part-2-move-the-model-to-the-cloud/chapter-8-reproduce-the-ml-experiment-in-a-cicd-pipeline.md +++ b/docs/part-2-move-the-model-to-the-cloud/chapter-8-reproduce-the-ml-experiment-in-a-cicd-pipeline.md @@ -1,14 +1,5 @@ # Chapter 8: Reproduce the ML experiment in a CI/CD pipeline -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction At this point, your code, your data and your execution process should be shared @@ -182,15 +173,19 @@ Depending on the CI/CD platform you are using, the process will be different. Google Cloud as `base64`. It allows to hide the secret in GitLab CI logs as a security measure. - !!! tip + === ":simple-linux: Linux & :simple-windows: Windows" - If on Linux, you can use the command - `base64 -w 0 -i ~/.config/gcloud/ dvc-google-service-account-key.json`. + ```sh title="Execute the following command(s) in a terminal" + # Encode the Google Service Account key to base64 + base64 -w 0 -i ~/.config/gcloud/dvc-google-service-account-key.json + ``` - ```sh title="Execute the following command(s) in a terminal" - # Encode the Google Service Account key to base64 - base64 -i ~/.config/gcloud/dvc-google-service-account-key.json - ``` + === ":simple-apple: macOS" + + ```sh title="Execute the following command(s) in a terminal" + # Encode the Google Service Account key to base64 + base64 -i ~/.config/gcloud/dvc-google-service-account-key.json + ``` **Store the Google Service Account key as a CI/CD variable** @@ -315,10 +310,11 @@ Depending on the CI/CD platform you are using, the process will be different. before_script: # Set the Google Service Account key - echo "${DVC_GCP_SERVICE_ACCOUNT_KEY}" | base64 -d > $GOOGLE_APPLICATION_CREDENTIALS - # Install dependencies + # Create the virtual environment for caching - python3 -m venv .venv - source .venv/bin/activate - - pip install --requirement requirements.txt + # Install dependencies + - pip install --requirement requirements-freeze.txt script: # Run the experiment - dvc repro --pull --allow-missing @@ -388,8 +384,8 @@ This chapter is done, you can check the summary. ## Summary -Congrats! You now have a CI/CD pipeline that will run the experiment on each -commit. +Congratulations! You now have a CI/CD pipeline that will run the experiment on +each commit. In this chapter, you have successfully: diff --git a/docs/part-2-move-the-model-to-the-cloud/chapter-9-track-model-evolution-in-the-cicd-pipeline-with-cml.md b/docs/part-2-move-the-model-to-the-cloud/chapter-9-track-model-evolution-in-the-cicd-pipeline-with-cml.md index 3756432d..8feabef3 100644 --- a/docs/part-2-move-the-model-to-the-cloud/chapter-9-track-model-evolution-in-the-cicd-pipeline-with-cml.md +++ b/docs/part-2-move-the-model-to-the-cloud/chapter-9-track-model-evolution-in-the-cicd-pipeline-with-cml.md @@ -1,14 +1,5 @@ # Chapter 9: Track model evolution in the CI/CD pipeline with CML -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction At this point, you have a CI/CD pipeline that will run the experiment on each @@ -326,7 +317,6 @@ collaboration and decision-making within the team. - **Token name**: _gitlab-ci[bot]_ - **Expiration date**: _None_ - - **Select a role**: _Developer_ - **Select scopes**: `api`, `read_repository` and `write_repository` Select **Create personal access token** to create the token. Copy it. It will be @@ -349,7 +339,7 @@ collaboration and decision-making within the team. Explore this file to understand the `report` stage and its steps. - ```yaml title=".gitlab-ci.yml" hl_lines="3 13-14 39-104" + ```yaml title=".gitlab-ci.yml" hl_lines="3 13-14 40-97" stages: - train - report @@ -380,10 +370,11 @@ collaboration and decision-making within the team. before_script: # Set the Google Service Account key - echo "${DVC_GCP_SERVICE_ACCOUNT_KEY}" | base64 -d > $GOOGLE_APPLICATION_CREDENTIALS - # Install dependencies + # Create the virtual environment for caching - python3 -m venv .venv - source .venv/bin/activate - - pip install --requirement requirements.txt + # Install dependencies + - pip install --requirement requirements-freeze.txt script: # Run the experiment - dvc repro --pull --allow-missing @@ -471,7 +462,7 @@ collaboration and decision-making within the team. ```diff diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml - index 726b176..3280ad9 100644 + index 4bf0954..722c708 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -1,5 +1,6 @@ @@ -490,7 +481,10 @@ collaboration and decision-making within the team. train: stage: train - @@ -31,4 +34,71 @@ train: + @@ -33,3 +36,62 @@ train: + script: + # Run the experiment + - dvc repro --pull --allow-missing + +report: + stage: report @@ -580,7 +574,7 @@ Take some time to understand the changes made to the file. git add .gitlab-ci.yml # Commit the changes - git commit -m "Add cml reporting to CI/CD pipeline" + git commit -m "Add CML reporting to CI/CD pipeline" # Push the changes git push @@ -605,9 +599,9 @@ This chapter is done, you can check the summary. ## Summary -Congrats! You now have a CI/CD pipeline that will run and update the experiment -results as well as create a report comparing the results with the main branch on -a pull request. +Congratulations! You now have a CI/CD pipeline that will run and update the +experiment results as well as create a report comparing the results with the +main branch on a pull request. In this chapter, you have successfully: diff --git a/docs/part-3-serve-and-deploy-the-model/chapter-11-save-and-load-the-model-with-mlem.md b/docs/part-3-serve-and-deploy-the-model/chapter-11-save-and-load-the-model-with-mlem.md index f874d998..9366dffa 100644 --- a/docs/part-3-serve-and-deploy-the-model/chapter-11-save-and-load-the-model-with-mlem.md +++ b/docs/part-3-serve-and-deploy-the-model/chapter-11-save-and-load-the-model-with-mlem.md @@ -1,14 +1,5 @@ # Chapter 11: Save and load the model with MLEM -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction The purpose of this chapter is to serve and use the model for usage outside of diff --git a/docs/part-3-serve-and-deploy-the-model/chapter-12-serve-the-model-locally-with-mlem.md b/docs/part-3-serve-and-deploy-the-model/chapter-12-serve-the-model-locally-with-mlem.md index 6f4f31a3..cc2e44a5 100644 --- a/docs/part-3-serve-and-deploy-the-model/chapter-12-serve-the-model-locally-with-mlem.md +++ b/docs/part-3-serve-and-deploy-the-model/chapter-12-serve-the-model-locally-with-mlem.md @@ -1,14 +1,5 @@ # Chapter 12: Serve the model locally with MLEM -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction Now that the model is using [MLEM](../tools.md), enabling the extraction of @@ -350,7 +341,7 @@ git push ### Check the results -Congrats! You now have a model served over a REST API! +Congratulations! You now have a model served over a REST API! This chapter is done, you can check the summary. diff --git a/docs/part-3-serve-and-deploy-the-model/chapter-13-deploy-and-access-the-model-on-kubernetes-with-mlem.md b/docs/part-3-serve-and-deploy-the-model/chapter-13-deploy-and-access-the-model-on-kubernetes-with-mlem.md index d1bc1c56..0086036e 100644 --- a/docs/part-3-serve-and-deploy-the-model/chapter-13-deploy-and-access-the-model-on-kubernetes-with-mlem.md +++ b/docs/part-3-serve-and-deploy-the-model/chapter-13-deploy-and-access-the-model-on-kubernetes-with-mlem.md @@ -1,14 +1,5 @@ # Chapter 13: Deploy and access the model on Kubernetes with MLEM -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction Serving the model locally is great for testing purposes, but it is not @@ -169,7 +160,7 @@ Follow the steps below to create one. [Regions and zones](https://cloud.google.com/compute/docs/regions-zones#available). You should ideally select a location close to where most of the expected traffic will come from. Replace `` with your own zone (ex: - `europe-west6-a` for Switzerland (Zurich)). + `europe-west6-a` for Zurich, Switzerland). You can also view the available types of machine with the `gcloud compute machine-types list` command. @@ -364,7 +355,7 @@ for an efficient models management. Export the repository location as an environment variable. Replace `` with your own location (ex: `europe-west6` for - Switzerland Zurich). + Switzerland). ```sh title="Execute the following command(s) in a terminal" export GCP_REPOSITORY_LOCATION= @@ -409,14 +400,15 @@ for an efficient models management. **Authenticate with the Google Container Registry** + Configure gcloud to use the Google Container Registry as a Docker credential + helper. + ```sh title="Execute the following command(s) in a terminal" # Authenticate with the Google Container Registry gcloud auth configure-docker ${GCP_REPOSITORY_LOCATION}-docker.pkg.dev ``` - ```sh title="Execute the following command(s) in a terminal" - export GCP_PROJECT_ID=$(gcloud config get-value project) - ``` + Press ++y++ to validate the changes. Export the container registry host: @@ -424,6 +416,29 @@ for an efficient models management. export CONTAINER_REGISTRY_HOST=${GCP_REPOSITORY_LOCATION}-docker.pkg.dev/$GCP_PROJECT_ID/$GCP_REPOSITORY_NAME ``` + !!! tip + + To get the ID of your project, you can use the Google Cloud CLI. + + ```sh title="Execute the following command(s) in a terminal" + # List the projects + gcloud projects list + ``` + + The output should be similar to this. + + ``` + PROJECT_ID NAME PROJECT_NUMBER + mlops-workshop-396007 mlops-workshop 475307267926 + ``` + + Copy the PROJECT_ID and export it as an environment variable. Replace + `` with your own project ID. + + ```sh title="Execute the following command(s) in a terminal" + export GCP_PROJECT_ID= + ``` + === ":material-cloud: Using another cloud provider? Read this!" This guide has been written with Google Cloud in mind. We are open to @@ -452,11 +467,11 @@ operation can takes a few minutes. ```sh title="Execute the following command(s) in a terminal" # Deploy the model on Kubernetes with MLEM mlem deployment run kubernetes service_classifier \ ---model model \ ---registry remote \ ---registry.host=$CONTAINER_REGISTRY_HOST \ ---server fastapi \ ---service_type loadbalancer + --model model \ + --registry remote \ + --registry.host=$CONTAINER_REGISTRY_HOST \ + --server fastapi \ + --service_type loadbalancer ``` The name `service_classifier` is the name of the deployment. It can be changed @@ -470,24 +485,29 @@ The arguments are: - `--server fastapi`: Use FastAPI as the server. - `--service_type loadbalancer`: Use a load balancer to expose the service. -The output should be similar to this. +The output should be similar to this. This might take a few minutes. ``` -💾 Saving deployment to service_classifier.mlem + Loading deployment from service_classifier.mlem ⏳️ Loading model from model.mlem 🛠 Creating docker image ml 💼 Adding model files... 🛠 Generating dockerfile... 💼 Adding sources... 💼 Generating requirements file... - 🛠 Building docker image europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry/ml:dbdf5b923413970ed7cd31cc5da22455... -2023-08-02 10:58:29,915 [WARNING] mlem.contrib.docker.base: Skipped logging in to remote registry at host europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry because no credentials given. You could specify credentials as EUROPE-WEST6-DOCKER_PKG_DEV/MLOPS-TEST-391911/MLOPS-REGISTRY_USERNAME and EUROPE-WEST6-DOCKER_PKG_DEV/MLOPS-TEST-391911/MLOPS-REGISTRY_PASSWORD environment variables. -✅ Built docker image europe-west6-docker.pkg.dev/mlops-registry/mlops-registry/ml:dbdf5b923413970ed7cd31cc5da22455 - 🔼 Pushing image europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry/ml:dbdf5b923413970ed7cd31cc5da22455 to -europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry + 🛠 Building docker image +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry/ml:8909b3c8feeeef6ff +4e4cdbf3a2fa251... +2023-08-15 11:25:43,391 [WARNING] mlem.contrib.docker.base: Skipped logging in to remote registry at host europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry because no credentials given. You could specify credentials as EUROPE-WEST6-DOCKER_PKG_DEV/MLOPS-WORKSHOP-396007/MLOPS-REGISTRY_USERNAME and EUROPE-WEST6-DOCKER_PKG_DEV/MLOPS-WORKSHOP-396007/MLOPS-REGISTRY_PASSWORD environment variables. + ✅ Built docker image +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry/ml:8909b3c8feeeef6ff +4e4cdbf3a2fa251 + 🔼 Pushing image +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry/ml:8909b3c8feeeef6ff +4e4cdbf3a2fa251 to europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry ✅ Pushed image -europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry/ml:dbdf5b923413970ed7cd31cc5da22455 to -europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry/ml:8909b3c8feeeef6ff +4e4cdbf3a2fa251 to europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry namespace created. status='{'conditions': None, 'phase': 'Active'}' deployment created. status='{'available_replicas': None, 'collision_count': None, @@ -709,22 +729,28 @@ The arguments are: The output should be similar to this. ``` -💾 Saving deployment to service_classifier.mlem -⏳️ Loading model from model.mlem + Loading model from model.mlem +⏳️ Loading deployment from service_classifier.mlem 🛠 Creating docker image mlops-classifier 💼 Adding model files... 🛠 Generating dockerfile... 💼 Adding sources... 💼 Generating requirements file... - 🛠 Building docker image europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry/mlops-classifier:dbdf5b923413970ed7cd31cc5da22455... -2023-08-02 10:58:29,915 [WARNING] mlem.contrib.docker.base: Skipped logging in to remote registry at host europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry because no credentials given. You could specify credentials as EUROPE-WEST6-DOCKER_PKG_DEV/MLOPS-TEST-391911/MLOPS-REGISTRY_USERNAME and EUROPE-WEST6-DOCKER_PKG_DEV/MLOPS-TEST-391911/MLOPS-REGISTRY_PASSWORD environment variables. - ✅ Built docker image europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry/mlops-classifier:dbdf5b923413970ed7cd31cc5da22455 - 🔼 Pushing image europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry/mlops-classifier:dbdf5b923413970ed7cd31cc5da22455 to -europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry + 🛠 Building docker image +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry/mlops-classifier:890 +9b3c8feeeef6ff4e4cdbf3a2fa251... +2023-08-15 11:58:06,396 [WARNING] mlem.contrib.docker.base: Skipped logging in to remote registry at host europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry because no credentials given. You could specify credentials as EUROPE-WEST6-DOCKER_PKG_DEV/MLOPS-WORKSHOP-396007/MLOPS-REGISTRY_USERNAME and EUROPE-WEST6-DOCKER_PKG_DEV/MLOPS-WORKSHOP-396007/MLOPS-REGISTRY_PASSWORD environment variables. + ✅ Built docker image +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry/mlops-classifier:890 +9b3c8feeeef6ff4e4cdbf3a2fa251 + 🔼 Pushing image +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry/mlops-classifier:890 +9b3c8feeeef6ff4e4cdbf3a2fa251 to +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry ✅ Pushed image -europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry/mlops-classifier:dbdf -5b923413970ed7cd31cc5da22455 to -europe-west6-docker.pkg.dev/mlops-test-391911/mlops-registry +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry/mlops-classifier:890 +9b3c8feeeef6ff4e4cdbf3a2fa251 to +europe-west6-docker.pkg.dev/mlops-workshop-396007/mlops-registry namespace created. status='{'conditions': None, 'phase': 'Active'}' deployment created. status='{'available_replicas': None, 'collision_count': None, @@ -784,11 +810,14 @@ Changes to be committed: new file: service_classifier.mlem ``` -### Commit the changes to Git +### Commit the changes to DVC and Git -Commit the changes to Git. +Commit the changes to DVC and Git. ```sh title="Execute the following command(s) in a terminal" +# Push the model to DVC +dvc push + # Commit the changes git commit -m "MLEM can deploy the model with FastAPI on Kubernetes" diff --git a/docs/part-3-serve-and-deploy-the-model/chapter-14-continuous-deployment-of-the-model-with-mlem-and-the-cicd-pipeline.md b/docs/part-3-serve-and-deploy-the-model/chapter-14-continuous-deployment-of-the-model-with-mlem-and-the-cicd-pipeline.md index b7d48529..0c58d987 100644 --- a/docs/part-3-serve-and-deploy-the-model/chapter-14-continuous-deployment-of-the-model-with-mlem-and-the-cicd-pipeline.md +++ b/docs/part-3-serve-and-deploy-the-model/chapter-14-continuous-deployment-of-the-model-with-mlem-and-the-cicd-pipeline.md @@ -1,14 +1,5 @@ # Chapter 14: Continuous deployment of the model with MLEM and the CI/CD pipeline -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - ## Introduction In this chapter, you will deploy the model to the Kubernetes cluster with the @@ -23,7 +14,7 @@ In this chapter, you will learn how to: 1. Grant access to the container registry on the cloud provider 2. Store the cloud provider credentials in the CI/CD configuration 3. Create the CI/CD pipeline for deploying the model to the Kubernetes cluster -4. Push the CI/CD pipeline configuration file to [Git](../tools.md) +4. Push the CI/CD pipeline configuration file to [:simple-git: Git](../tools.md) 5. Visualize the execution of the CI/CD pipeline The following diagram illustrates control flow of the experiment at the end of @@ -179,7 +170,7 @@ but this time for MLEM. # Set the Artifact Registry permissions for the Google Service Account gcloud projects add-iam-policy-binding $GCP_PROJECT_ID \ --member="serviceAccount:mlem-service-account@${GCP_PROJECT_ID}.iam.gserviceaccount.com" \ - --role="roles/storage.objectAdmin" + --role="roles/storage.objectAdmin" \ --role="roles/artifactregistry.createOnPushWriter" # Set the Kubernetes Cluster permissions for the Google Service Account @@ -236,15 +227,19 @@ Depending on the CI/CD platform you are using, the process will be different. Google Cloud as `base64`. It allows to hide the secret in GitLab CI logs as a security measure. - !!! tip + === ":simple-linux: Linux & :simple-windows: Windows" - If on Linux, you can use the command - `base64 -w 0 -i ~/.config/gcloud/mlem-google-service-account-key.json`. + ```sh title="Execute the following command(s) in a terminal" + # Encode the Google Service Account key to base64 + base64 -w 0 -i ~/.config/gcloud/mlem-google-service-account-key.json + ``` - ```sh title="Execute the following command(s) in a terminal" - # Encode the Google Service Account key to base64 - base64 -i ~/.config/gcloud/mlem-google-service-account-key.json - ``` + === ":simple-apple: macOS" + + ```sh title="Execute the following command(s) in a terminal" + # Encode the Google Service Account key to base64 + base64 -i ~/.config/gcloud/mlem-google-service-account-key.json + ``` **Store the Google Service Account key as a CI/CD variable** @@ -300,8 +295,11 @@ following steps will be performed: === ":simple-github: GitHub" - At the root level of your Git repository, create a new GitHub Workflow - configuration file `.github/workflows/mlops-deploy.yml`. + At the root level of your Git repository, create a new GitHub workflow + configuration file `.github/workflows/mlops-deploy.yml`. Replace + `` with your own name (ex: `mlops-kubernetes`). Replace + `` with your own zone (ex: `europe-west6-a` for Zurich, + Switzerland). Take some time to understand the deploy job and its steps. @@ -327,7 +325,7 @@ following steps will be performed: python-version: '3.10' cache: 'pip' - name: Install dependencies - run: pip install -r requirements-freeze.txt + run: pip install --requirement requirements-freeze.txt - name: Login to Google Cloud uses: 'google-github-actions/auth@v1' with: @@ -335,8 +333,8 @@ following steps will be performed: - name: Get Google Cloud's Kubernetes credentials uses: 'google-github-actions/get-gke-credentials@v1' with: - cluster_name: 'mlops-kubernetes' - location: 'europe-west6-a' + cluster_name: '' + location: '' - name: Deploy the model run: mlem deployment run --load service_classifier --model model ``` @@ -354,11 +352,11 @@ following steps will be performed: branches: - main - # Runs on pull requests - pull_request: + # Runs on pull requests + pull_request: - # Allows you to run this workflow manually from the Actions tab - workflow_dispatch: + # Allows you to run this workflow manually from the Actions tab + workflow_dispatch: jobs: train-and-report: @@ -446,6 +444,7 @@ following steps will be performed: needs: train-and-report name: Call Deploy uses: ./.github/workflows/mlops-deploy.yml + secrets: inherit ``` Check the differences with Git to validate the changes. @@ -458,12 +457,302 @@ following steps will be performed: The output should be similar to this: ```diff - TODO + diff --git a/.github/workflows/mlops.yml b/.github/workflows/mlops.yml + index f40cb93..26e25f9 100644 + --- a/.github/workflows/mlops.yml + +++ b/.github/workflows/mlops.yml + @@ -91,3 +91,10 @@ jobs: + + # Publish the CML report + cml comment update --target=pr --publish report.md + + + + deploy: + + # Runs on main branch only + + if: github.ref == 'refs/heads/main' + + needs: train-and-report + + name: Call Deploy + + uses: ./.github/workflows/mlops-deploy.yml + + secrets: inherit ``` === ":simple-gitlab: GitLab" - _Work in progress._ + In order to execute commands on the Kubernetes cluster, an agent must be set up + on the cluster. + + **Create the agent configuration file** + + Create a new empty file named `.gitlab/agents/k8s-agent/config.yaml` at the root + of the repository. + + This file is empty and only serves to enable Kubernetes integration with GitLab. + + Commit the changes to Git. + + ```sh title="Execute the following command(s) in a terminal" + # Add the file + git add .gitlab/agents/k8s-agent/config.yaml + + # Commit the changes + git commit -m "Enable Kubernetes integration with GitLab" + + # Push the changes + git push + ``` + + **Register the agent with GitLab** + + On GitLab, in the left sidebar, go to **Operate > Kubernetes clusters**. Click + on **Connect a cluster**. Select the **k8s-agent** configuration file in the + list. Click **Register**. A modal opens. + + In the modal, a command to register the GitLab Kubernetes agent is displayed. + + The command should look like this: + + ```sh + helm repo add gitlab https://charts.gitlab.io + helm repo update + helm upgrade --install XXX gitlab/gitlab-agent \ + --namespace XXX \ + --create-namespace \ + --set image.tag=XXX \ + --set config.token=XXX \ + --set config.kasAddress=XXX + ``` + + This command must be executed on the Google Cloud Kubernetes cluster. + + **Install the agent on the Kubernetes cluster** + + Copy and paste the command GitLab displays in your terminal. This should install + the GitLab agent on the Kubernetes cluster. + + The output should look like this: + + ``` + "gitlab" has been added to your repositories + Hang tight while we grab the latest from your chart repositories... + ...Successfully got an update from the "gitlab" chart repository + Update Complete. ⎈Happy Helming!⎈ + Release "k8s-agent" does not exist. Installing it now. + NAME: k8s-agent + LAST DEPLOYED: Tue Aug 15 13:59:01 2023 + NAMESPACE: gitlab-agent-k8s-agent + STATUS: deployed + REVISION: 1 + TEST SUITE: None + NOTES: + Thank you for installing gitlab-agent. + + Your release is named k8s-agent. + ``` + + Once the command was executed on the Kubernetes cluster, you can close the + model. + + Refresh the page and you should see the agent has successfully connected to + GitLab. + + **Update the CI/CD pipeline configuration file** + + Update the `.gitlab-ci.yml` file to add a new stage to deploy the model on the + Kubernetes cluster. + + Take some time to understand the deploy job and its steps. + + ```yaml title=".gitlab-ci.yml" hl_lines="4 100-130" + stages: + - train + - report + - deploy + + variables: + # Change pip's cache directory to be inside the project directory since we can + # only cache local items. + PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip" + # https://dvc.org/doc/user-guide/troubleshooting?tab=GitLab-CI-CD#git-shallow + GIT_DEPTH: "0" + # Set the path to Google Service Account key for DVC - https://dvc.org/doc/command-reference/remote/add#google-cloud-storage + GOOGLE_APPLICATION_CREDENTIALS: "${CI_PROJECT_DIR}/google-service-account-key.json" + # Environment variable for CML + REPO_TOKEN: $GITLAB_PAT + + train: + stage: train + image: python:3.10 + rules: + - if: $CI_COMMIT_BRANCH == "main" + - if: $CI_PIPELINE_SOURCE == "merge_request_event" + cache: + paths: + # Pip's cache doesn't store the Python packages + # https://pip.pypa.io/en/stable/reference/pip_install/#caching + - .cache/pip + - .venv/ + before_script: + # Set the Google Service Account key + - echo "${DVC_GCP_SERVICE_ACCOUNT_KEY}" | base64 -d > $GOOGLE_APPLICATION_CREDENTIALS + # Create the virtual environment for caching + - python3 -m venv .venv + - source .venv/bin/activate + # Install dependencies + - pip install --requirement requirements-freeze.txt + script: + # Run the experiment + - dvc repro --pull --allow-missing + + report: + stage: report + image: iterativeai/cml:0-dvc3-base1 + needs: + - train + rules: + - if: $CI_PIPELINE_SOURCE == "merge_request_event" + before_script: + # Set the Google Service Account key + - echo "${DVC_GCP_SERVICE_ACCOUNT_KEY}" | base64 -d > $GOOGLE_APPLICATION_CREDENTIALS + script: + - | + # Fetch the experiment changes + dvc pull + + # Fetch all other Git branches + git fetch --depth=1 origin main:main + + # Add title to the report + echo "# Experiment Report (${CI_COMMIT_SHA})" >> report.md + + # Compare parameters to main branch + echo "## Params workflow vs. main" >> report.md + dvc params diff main --md >> report.md + + # Compare metrics to main branch + echo "## Metrics workflow vs. main" >> report.md + dvc metrics diff main --md >> report.md + + # Compare plots (images) to main branch + dvc plots diff main + + # Create plots + echo "## Plots" >> report.md + + # Create training history plot + echo "### Training History" >> report.md + echo "#### main" >> report.md + echo '![](./dvc_plots/static/main_evaluation_plots_training_history.png "Training History")' >> report.md + echo "#### workspace" >> report.md + echo '![](./dvc_plots/static/workspace_evaluation_plots_training_history.png "Training History")' >> report.md + + # Create predictions preview + echo "### Predictions Preview" >> report.md + echo "#### main" >> report.md + echo '![](./dvc_plots/static/main_evaluation_plots_pred_preview.png "Predictions Preview")' >> report.md + echo "#### workspace" >> report.md + echo '![](./dvc_plots/static/workspace_evaluation_plots_pred_preview.png "Predictions Preview")' >> report.md + + # Create confusion matrix + echo "### Confusion Matrix" >> report.md + echo "#### main" >> report.md + echo '![](./dvc_plots/static/main_evaluation_plots_confusion_matrix.png "Confusion Matrix")' >> report.md + echo "#### workspace" >> report.md + echo '![](./dvc_plots/static/workspace_evaluation_plots_confusion_matrix.png "Confusion Matrix")' >> report.md + + # Publish the CML report + cml comment update --target=pr --publish report.md + + deploy: + stage: deploy + image: python:3.10 + rules: + - if: $CI_COMMIT_BRANCH == "main" + cache: + paths: + # Pip's cache doesn't store the Python packages + # https://pip.pypa.io/en/stable/reference/pip_install/#caching + - .cache/pip + - .venv/ + before_script: + # Install Kubernetes + - export KUBERNETES_VERSION=$(curl -L -s https://dl.k8s.io/release/stable.txt) + - curl -LO -s "https://dl.k8s.io/release/${KUBERNETES_VERSION}/bin/linux/amd64/kubectl" + - curl -LO -s "https://dl.k8s.io/${KUBERNETES_VERSION}/bin/linux/amd64/kubectl.sha256" + - echo "$(cat kubectl.sha256) kubectl" | sha256sum --check + - install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl + # Switch to the right Kubernetes context + - kubectl config use-context ${CI_PROJECT_PATH}:k8s-agent + - export KUBERNETES_CONFIGURATION=$(cat $KUBECONFIG) + # Set the Google Service Account key + - echo "${MLEM_GCP_SERVICE_ACCOUNT_KEY}" | base64 -d > $GOOGLE_APPLICATION_CREDENTIALS + # Create the virtual environment for caching + - python3 -m venv .venv + - source .venv/bin/activate + # Install dependencies + - pip install --requirement requirements-freeze.txt + script: + # Deploy the model + - mlem deployment run --load service_classifier --model model + ``` + + Check the differences with Git to validate the changes. + + ```sh title="Execute the following command(s) in a terminal" + # Show the differences with Git + git diff .gitlab-ci.yml + ``` + + The output should be similar to this: + + ```diff + diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml + index 722c708..ed9e228 100644 + --- a/.gitlab-ci.yml + +++ b/.gitlab-ci.yml + @@ -1,6 +1,7 @@ + stages: + - train + - report + + - deploy + + variables: + # Change pip's cache directory to be inside the project directory since we can + @@ -95,3 +96,35 @@ report: + + # Publish the CML report + cml comment update --target=pr --publish report.md + + + +deploy: + + stage: deploy + + image: python:3.10 + + rules: + + - if: $CI_COMMIT_BRANCH == "main" + + cache: + + paths: + + # Pip's cache doesn't store the Python packages + + # https://pip.pypa.io/en/stable/reference/pip_install/#caching + + - .cache/pip + + - .venv/ + + before_script: + + # Install Kubernetes + + - export KUBERNETES_VERSION=$(curl -L -s https://dl.k8s.io/release/stable.txt) + + - curl -LO -s "https://dl.k8s.io/release/${KUBERNETES_VERSION}/bin/linux/amd64/kubectl" + + - curl -LO -s "https://dl.k8s.io/${KUBERNETES_VERSION}/bin/linux/amd64/kubectl.sha256" + + - echo "$(cat kubectl.sha256) kubectl" | sha256sum --check + + - install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl + + # Switch to the right Kubernetes context + + - kubectl config use-context ${CI_PROJECT_PATH}:k8s-agent + + - export KUBERNETES_CONFIGURATION=$(cat $KUBECONFIG) + + # Set the Google Service Account key + + - echo "${MLEM_GCP_SERVICE_ACCOUNT_KEY}" | base64 -d > $GOOGLE_APPLICATION_CREDENTIALS + + # Create the virtual environment for caching + + - python3 -m venv .venv + + - source .venv/bin/activate + + # Install dependencies + + - pip install --requirement requirements-freeze.txt + + script: + + # Deploy the model + + - mlem deployment run --load service_classifier --model model + ``` ### Check the changes @@ -479,15 +768,28 @@ git status The output should look like this. -``` -On branch main -Your branch is up to date with 'origin/main'. +=== ":simple-github: GitHub" -Changes to be committed: -(use "git restore --staged ..." to unstage) - modified: .github/workflows/mlops.yml - new file: .github/workflows/mlops-deploy.yml -``` + ``` + On branch main + Your branch is up to date with 'origin/main'. + + Changes to be committed: + (use "git restore --staged ..." to unstage) + modified: .github/workflows/mlops.yml + new file: .github/workflows/mlops-deploy.yml + ``` + +=== ":simple-gitlab: GitLab" + + ``` + On branch main + Your branch is up to date with 'origin/main'. + + Changes to be committed: + (use "git restore --staged ..." to unstage) + modified: .gitlab-ci.yml + ``` ### Commit the changes to Git @@ -510,21 +812,37 @@ latest version is consistently available on the Kubernetes server for use. === ":simple-github: GitHub" - In the **Actions** tab, if you click on the **Call Deploy** > **deploy** - pipeline, you should see the following output for the `Deploy the model` step: + In the **Actions** tab, click on the **Call Deploy** > **deploy**. - ``` - > mlem deployment run --load service_classifier --model model +=== ":simple-gitlab: GitLab" - ⏳️ Loading model from model.mlem - ⏳️ Loading deployment from service_classifier.mlem - ``` + You can see the pipeline running on the **CI/CD > Pipelines** page. Check the + `deploy` job: - Note that since the model has not changed, MLEM has not re-deployed the model. +The output should look like this. -=== ":simple-gitlab: GitLab" +``` +> mlem deployment run --load service_classifier --model model - _Work in progress._ +⏳️ Loading model from model.mlem +⏳️ Loading deployment from service_classifier.mlem +``` + +Note that since the model has not changed, MLEM has not re-deployed the model. + +??? bug "Having a `NameError: name 'UUID' is not defined` error? Read this!" + + The `NameError: name 'UUID' is not defined` error is a known issue with MLEM: + . You can disable MLEM telemetry + with the following command: + + ```sh + # Disable telemetry + mlem config set core.no_analytics True + ``` + + This will update the `mlem.yaml` file to disable telemetry, solving the + mentioned issue. ## State of the MLOps process @@ -551,3 +869,9 @@ latest version is consistently available on the Kubernetes server for use. You can now safely continue to the next chapter of this guide concluding your journey and the next things you could do with your model. + +## Sources + +Highly inspired by: + +- [_Installing the agent for Kubernetes_ - gitlab.com](https://docs.gitlab.com/ee/user/clusters/agent/install/) diff --git a/docs/part-3-serve-and-deploy-the-model/chapter-15-train-the-model-on-a-kubernetes-pod-with-cml.md b/docs/part-3-serve-and-deploy-the-model/chapter-15-train-the-model-on-a-kubernetes-pod-with-cml.md index fc0b9c3f..d2a03799 100644 --- a/docs/part-3-serve-and-deploy-the-model/chapter-15-train-the-model-on-a-kubernetes-pod-with-cml.md +++ b/docs/part-3-serve-and-deploy-the-model/chapter-15-train-the-model-on-a-kubernetes-pod-with-cml.md @@ -1,13 +1,9 @@ # Chapter 15: Train the model on a Kubernetes pod with CML -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." +!!! warning "This is a work in progress" - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. + This chapter is a work in progress. Please check back later for updates. Thank + you! ## Introduction @@ -302,8 +298,6 @@ Depending on the CI/CD platform you are using, the process will be different. of the Google Service Account key file as its value. Save the variable by selecting **Add secret**. - TODO: Create new GitHub PAT for CML. - === ":simple-gitlab: GitLab" Store the output as a CI/CD Variable by going to **Settings > CI/CD** from the @@ -357,7 +351,9 @@ you'll be able to start the training of the model on the node with the GPU. Create a new variable named `CML_PAT` with the value of the Personal Access Token as its value. Save the variable by selecting **Add secret**. - Update the `.github/workflows/mlops.yml` file. + Update the `.github/workflows/mlops.yml` file. Replace `` with + your own name (ex: `mlops-kubernetes`). Replace `` with your + own zone (ex: `europe-west6-a` for Zurich, Switzerland). ```yaml title=".github/workflows/mlops.yml" hl_lines="15-18 21-51 54-56" name: MLOps @@ -392,11 +388,11 @@ you'll be able to start the training of the model on the node with the GPU. - name: Get Google Cloud's Kubernetes credentials uses: 'google-github-actions/get-gke-credentials@v1' with: - cluster_name: 'mlops-kubernetes' - location: 'europe-west6-a' + cluster_name: '' + location: '' - uses: iterative/setup-cml@v1 with: - version: '0.19.0' + version: '0.19.1' - name: Initialize runner on Kubernetes env: REPO_TOKEN: ${{ secrets.CML_PAT }} @@ -412,11 +408,10 @@ you'll be able to start the training of the model on the node with the GPU. --cloud-kubernetes-node-selector="gpu=true" \ --single - train: + train-and-report: permissions: write-all needs: setup-runner runs-on: [self-hosted, cml-runner] - timeout-minutes: 50400 # 35 days steps: - name: Checkout repository uses: actions/checkout@v3 @@ -424,7 +419,7 @@ you'll be able to start the training of the model on the node with the GPU. uses: actions/setup-python@v4 with: python-version: '3.10' - cache: 'pip' + cache: pip - name: Install dependencies run: pip install --requirement requirements-freeze.txt - name: Login to Google Cloud @@ -433,41 +428,19 @@ you'll be able to start the training of the model on the node with the GPU. credentials_json: '${{ secrets.DVC_GCP_SERVICE_ACCOUNT_KEY }}' - name: Train model run: dvc repro --pull --allow-missing - # After the experiment is done we update the dvc.lock and push the - # changes with dvc. This allows dvc to cache the experiment results - # and use them locally and remotely on pipelines without running the - # experiment again. - - name: Commit changes in dvc.lock - uses: stefanzweifel/git-auto-commit-action@v4 - with: - commit_message: Commit changes in dvc.lock - file_pattern: dvc.lock - - name: Push experiment results to DVC remote storage - run: dvc push - - report: - permissions: write-all - needs: train - if: github.event_name == 'pull_request' - runs-on: ubuntu-latest - steps: - - name: Checkout repository - uses: actions/checkout@v3 + # Node is required to run CML + - name: Setup Node + if: github.event_name == 'pull_request' + uses: actions/setup-node@v3 with: - ref: ${{ github.event.pull_request.head.sha }} - - name: Login to Google Cloud - uses: 'google-github-actions/auth@v1' - with: - credentials_json: '${{ secrets.DVC_GCP_SERVICE_ACCOUNT_KEY }}' - - name: Setup DVC - uses: iterative/setup-dvc@v1 - with: - version: '3.2.2' + node-version: '16' - name: Setup CML + if: github.event_name == 'pull_request' uses: iterative/setup-cml@v1 with: version: '0.19.1' - name: Create CML report + if: github.event_name == 'pull_request' env: REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | @@ -516,10 +489,12 @@ you'll be able to start the training of the model on the node with the GPU. cml comment update --target=pr --publish report.md deploy: + # Runs on main branch only if: github.ref == 'refs/heads/main' - needs: train + needs: train-and-report name: Call Deploy uses: ./.github/workflows/mlops-deploy.yml + secrets: inherit ``` Check the differences with Git to validate the changes. @@ -532,7 +507,60 @@ you'll be able to start the training of the model on the node with the GPU. The output should be similar to this: ```diff - + TODO + diff --git a/.github/workflows/mlops.yml b/.github/workflows/mlops.yml + index 30bbce8..5d4a6dd 100644 + --- a/.github/workflows/mlops.yml + +++ b/.github/workflows/mlops.yml + @@ -12,10 +12,48 @@ on: + # Allows you to run this workflow manually from the Actions tab + workflow_dispatch: + + +# Allow the creation and usage of self-hosted runners + +permissions: + + contents: read + + id-token: write + + + jobs: + + setup-runner: + + runs-on: ubuntu-latest + + steps: + + - name: Checkout repository + + uses: actions/checkout@v3 + + - name: Login to Google Cloud + + uses: 'google-github-actions/auth@v1' + + with: + + credentials_json: '${{ secrets.CML_GCP_SERVICE_ACCOUNT_KEY }}' + + - name: Get Google Cloud's Kubernetes credentials + + uses: 'google-github-actions/get-gke-credentials@v1' + + with: + + cluster_name: '' + + location: '' + + - uses: iterative/setup-cml@v1 + + with: + + version: '0.19.1' + + - name: Initialize runner on Kubernetes + + env: + + REPO_TOKEN: ${{ secrets.CML_PAT }} + + run: | + + export KUBERNETES_CONFIGURATION=$(cat $KUBECONFIG) + + # https://cml.dev/doc/ref/runner + + # https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type + + # https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#{cpu}-{memory} + + cml runner \ + + --labels="cml-runner" \ + + --cloud="kubernetes" \ + + --cloud-type="1-2000" \ + + --cloud-kubernetes-node-selector="gpu=true" \ + + --single + + + train-and-report: + permissions: write-all + - runs-on: ubuntu-latest + + needs: setup-runner + + runs-on: [self-hosted, cml-runner] + steps: + - name: Checkout repository + uses: actions/checkout@v3 ``` Take some time to understand the changes made to the file. @@ -785,10 +813,10 @@ you'll be able to start the training of the model on the node with the GPU. ### Check the results -On GitLab, you can see the pipeline running on the **CI/CD > Pipelines** page. - On GitHub, you can see the pipeline running on the **Actions** page. +On GitLab, you can see the pipeline running on the **CI/CD > Pipelines** page. + The pod should be created on the Kubernetes Cluster. === ":simple-googlecloud: Google Cloud" @@ -817,8 +845,8 @@ This chapter is done, you can check the summary. ## Summary -Congrats! You now can train your model on on a custom infrastructure with custom -hardware for specific use-cases. +Congratulations! You now can train your model on on a custom infrastructure with +custom hardware for specific use-cases. In this chapter, you have successfully: diff --git a/docs/part-3-serve-and-deploy-the-model/introduction.md b/docs/part-3-serve-and-deploy-the-model/introduction.md index 26f0963b..4986a8a2 100644 --- a/docs/part-3-serve-and-deploy-the-model/introduction.md +++ b/docs/part-3-serve-and-deploy-the-model/introduction.md @@ -27,8 +27,9 @@ those described in the - A [:simple-github: GitHub](https://github.com) or a [:simple-gitlab: GitLab](https://gitlab.com) account - A [:simple-googlecloud: Google Cloud](https://cloud.google.com) account -- [:simple-docker: Docker](https://www.docker.com/) to set up and manage the - container registry +- [:simple-docker: Docker](https://www.docker.com/) must be installed to set up + and manage the container registry +- [:simple-helm: Helm](https://helm.sh/) must be installed if using GitLab ??? info "Using another cloud provider? Read this!" diff --git a/docs/part-4-labeling-the-data-and-retrain/chapter-16-setup-label-studio.md b/docs/part-4-labeling-the-data-and-retrain/chapter-16-setup-label-studio.md index de85baf8..f5d0279c 100644 --- a/docs/part-4-labeling-the-data-and-retrain/chapter-16-setup-label-studio.md +++ b/docs/part-4-labeling-the-data-and-retrain/chapter-16-setup-label-studio.md @@ -1,14 +1,5 @@ # Chapter 16: Setup Label Studio -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - !!! warning "This is a work in progress" This chapter is a work in progress. Please check back later for updates. Thank diff --git a/docs/part-4-labeling-the-data-and-retrain/chapter-17-import-existing-data-to-label-studio.md b/docs/part-4-labeling-the-data-and-retrain/chapter-17-import-existing-data-to-label-studio.md index 0daea9bb..65236775 100644 --- a/docs/part-4-labeling-the-data-and-retrain/chapter-17-import-existing-data-to-label-studio.md +++ b/docs/part-4-labeling-the-data-and-retrain/chapter-17-import-existing-data-to-label-studio.md @@ -1,14 +1,5 @@ # Chapter 17: Import existing data to Label Studio -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - !!! warning "This is a work in progress" This chapter is a work in progress. Please check back later for updates. Thank diff --git a/docs/part-4-labeling-the-data-and-retrain/chapter-18-label-new-data-with-label-studio.md b/docs/part-4-labeling-the-data-and-retrain/chapter-18-label-new-data-with-label-studio.md index 56640e45..f073d7ae 100644 --- a/docs/part-4-labeling-the-data-and-retrain/chapter-18-label-new-data-with-label-studio.md +++ b/docs/part-4-labeling-the-data-and-retrain/chapter-18-label-new-data-with-label-studio.md @@ -1,14 +1,5 @@ # Chapter 18: Label new data with Label Studio -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - !!! warning "This is a work in progress" This chapter is a work in progress. Please check back later for updates. Thank diff --git a/docs/part-4-labeling-the-data-and-retrain/chapter-19-retrain-the-model-from-new-data-with-dvc-sync.md b/docs/part-4-labeling-the-data-and-retrain/chapter-19-retrain-the-model-from-new-data-with-dvc-sync.md index 2a6df052..d20256da 100644 --- a/docs/part-4-labeling-the-data-and-retrain/chapter-19-retrain-the-model-from-new-data-with-dvc-sync.md +++ b/docs/part-4-labeling-the-data-and-retrain/chapter-19-retrain-the-model-from-new-data-with-dvc-sync.md @@ -1,14 +1,5 @@ # Chapter 19: Retrain the model from new data with DVC Sync -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - !!! warning "This is a work in progress" This chapter is a work in progress. Please check back later for updates. Thank diff --git a/docs/part-4-labeling-the-data-and-retrain/chapter-20-link-the-model-to-label-studio-and-get-predictions.md b/docs/part-4-labeling-the-data-and-retrain/chapter-20-link-the-model-to-label-studio-and-get-predictions.md index 797a194a..47b30356 100644 --- a/docs/part-4-labeling-the-data-and-retrain/chapter-20-link-the-model-to-label-studio-and-get-predictions.md +++ b/docs/part-4-labeling-the-data-and-retrain/chapter-20-link-the-model-to-label-studio-and-get-predictions.md @@ -1,14 +1,5 @@ # Chapter 20: Link the model to Label Studio and get predictions -??? info "You want to take over from this chapter? Collapse this section and follow the instructions below." - - !!! warning - - It might be easier to start from the previous chapter(s). Only follow this - section if you are comfortable with the content of the previous chapter(s). - - Work in progress. - !!! warning "This is a work in progress" This chapter is a work in progress. Please check back later for updates. Thank