[Docs] Add Change log and other edits (mlrun#2906)

Penumbra69 · Jan 16, 2023 · 02236e0 · 02236e0
1 parent f3438a3
commit 02236e0
Show file tree

Hide file tree

Showing 12 changed files with 387 additions and 24 deletions.
diff --git a/.gitignore b/.gitignore
@@ -27,4 +27,6 @@ tests/system/env.yml
 # pyenv file for working with several python versions
 .python-version
 *.bak
-docs/contributing.md
+docs/CONTRIBUTING.md
+docs/tutorial/colab/01-mlrun-basics-colab.ipynb
+
diff --git a/docs/_static/images/marketplace-ui.png b/docs/_static/images/marketplace-ui.png
diff --git a/docs/change-log/_index.md b/docs/change-log/_index.md
diff --git a/docs/contents.rst b/docs/contents.rst
@@ -38,4 +38,10 @@ Table of Contents
    genindex
    api/index
    cli
-   glossary
+   glossary
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Change log
+
+   change-log/_index
diff --git a/docs/data-prep/ingest-data-fs.md b/docs/data-prep/ingest-data-fs.md
@@ -32,12 +32,12 @@ also general limitations in [Attribute name restrictions](https://www.iguazio.co
 
 ## Inferring data
 
-There are two types of inferring: 
+There are 2 types of infer options:
 - Metadata/schema: This is responsible for describing the dataset and generating its meta-data, such as deducing the 
 data-types of the features and listing the entities that are involved. Options belonging to this type are 
 `Entities`, `Features` and `Index`. The `InferOptions` class has the `InferOptions.schema()` function which returns a value 
 containing all the options of this type.
--  Stats/preview: This related to calculating statistics and generating a preview of the actual data in the dataset. 
+- Stats/preview: This relates to calculating statistics and generating a preview of the actual data in the dataset. 
 Options of this type are `Stats`, `Histogram` and `Preview`. 
 
 The `InferOptions class` has the following values:<br>
@@ -54,7 +54,6 @@ The `InferOptions class` basically translates to a value that can be a combinati
 
 When simultaneously ingesting data and requesting infer options, part of the data might be ingested twice: once for inferring metadata/stats and once for the actual ingest. This is normal behavior.
 
-
 ## Ingest data locally
 
 Use a Feature Set to create the basic feature-set definition and then an ingest method to run a simple ingestion "locally" in the Jupyter Notebook pod.
@@ -137,14 +136,20 @@ When defining a source, it maps to nuclio event triggers. <br>
 You can also create a custom `source` to access various databases or data sources.
 
 ## Target stores
+
 By default, the feature sets are saved in parquet and the Iguazio NoSQL DB ({py:class}`~mlrun.datastore.NoSqlTarget`). <br>
 The parquet file is ideal for fetching large set of data for training while the key value is ideal for an online application since it supports low latency data retrieval based on key access. 
 
 ```{admonition} Note
 When working with the Iguazio MLOps platform the default feature set storage location is under the "Projects" container: `<project name>/fs/..` folder. 
 The default location can be modified in mlrun config or specified per ingest operation. The parquet/csv files can be stored in NFS, S3, Azure blob storage, Redis, and on Iguazio DB/FS.
 ```
-### Redis target store 
+
+### Redis target store
+
+```{admonition} Tech preview
+```
+
 The Redis online target is called, in MLRun, `RedisNoSqlTarget`. The functionality of the `RedisNoSqlTarget` is identical to the `NoSqlTarget` except for:
 - The `RedisNoSqlTarget` does not support the spark engine, (only supports the storey engine).
 - The `RedisNoSqlTarget` accepts path parameter in the form `<redis|rediss>://[<username>]:[<password>]@<host>[:port]`<br>

diff --git a/docs/data-prep/ingesting_data.md b/docs/data-prep/ingesting_data.md
@@ -85,11 +85,7 @@ facilities for executing pySpark code using a Spark service (which can be deploy
 as part of an Iguazio system) or through submitting the processing task to Spark-operator. The following page provides
 additional details and code-samples:
 
-<!---
-TODO - add this once we have Spark service documentation.
-1. [Spark service](???) - **do we have a page for this? Are we documenting it?**
--->
-1. [Spark operator](../runtimes/spark-operator.html)
+- [Spark operator](../runtimes/spark-operator.html)
 
 In a similar manner, Dask can be used for parallel processing of the data. To read data as a Dask `DataFrame`, use the
 following code:

diff --git a/docs/ecosystem.md b/docs/ecosystem.md
@@ -66,4 +66,8 @@ This section lists the data stores, development tools, services, platforms, etc.
 - Jenkins 
 - Github Actions 
 - Gitlab CI/CD 
-- KFP 
+- KFP
+
+## Browser
+
+MLRun runs on Chrome and Firefox.
diff --git a/docs/feature-store/using-spark-engine.md b/docs/feature-store/using-spark-engine.md
@@ -1,3 +1,4 @@
+(ingest-features-spark)=
 # Ingest features with Spark
 
 The feature store supports using Spark for ingesting, transforming, and writing results to data targets. When 

diff --git a/docs/index.md b/docs/index.md
@@ -5,9 +5,10 @@
 MLRun is an open MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications. MLRun significantly reduces engineering efforts, time to production, and computation resources.
 With MLRun, you can choose any IDE on your local machine or on the cloud. MLRun breaks the silos between data, ML, software, and DevOps/MLOps teams, enabling collaboration and fast continuous improvements.
 
-Get started with MLRun **{ref}`Tutorials and examples <tutorial>`**, **{ref}`Installation and setup guide <install-setup-guide>`**, or read about **{ref}`architecture`**.
+Get started with MLRun **{ref}`Tutorials and examples <tutorial>`**, **{ref}`Installation and setup guide <install-setup-guide>`**, 
 
-This page explains how MLRun addresses the [**MLOps tasks**](#mlops-tasks) and presents the [**MLRun core components**](#core-components).
+
+This page explains how MLRun addresses the [**MLOps tasks**](#mlops-tasks), and presents the [**MLRun core components**](#core-components).
 
 See the supported data stores, development tools, services, platforms, etc., supported by MLRun's open architecture in **{ref}`ecosystem`**.
 
@@ -49,7 +50,6 @@ See the supported data stores, development tools, services, platforms, etc., sup
 ```
 
 ````
-`````
 
 The [**MLOps development workflow**](./mlops-dev-flow.html) section describes the different tasks and stages in detail.
 MLRun can be used to automate and orchestrate all the different tasks or just specific tasks (and integrate them with what you have already deployed).
@@ -79,7 +79,7 @@ In addition, the MLRun [**Feature store**](./feature-store/feature-store.html) a
 
 {octicon}`mortar-board` **Docs:**
 {bdg-link-info}`Feature store <./feature-store/feature-store.html>`
-{bdg-link-info}`Data & artifacts <./concepts/data-feature-store.html>`
+{bdg-link-info}`Data & artifacts <./concepts/data.html>`
 , {octicon}`code-square` **Tutorials:**
 {bdg-link-primary}`quick start <./tutorial/01-mlrun-basics.html>`
 {bdg-link-primary}`Feature store <./feature-store/basic-demo.html>`
@@ -106,7 +106,7 @@ MLRun rapidly deploys and manages production-grade real-time or batch applicatio
 
 {octicon}`mortar-board` **Docs:**
 {bdg-link-info}`Realtime pipelines <./serving/serving-graph.html>`
-{bdg-link-info}`Batch inference <./concepts/TBD.html>`
+{bdg-link-info}`Batch inference <./deployment/batch_inference.html>`
 , {octicon}`code-square` **Tutorials:**
 {bdg-link-primary}`Realtime serving <./tutorial/03-model-serving.html>`
 {bdg-link-primary}`Batch inference <./tutorial/07-batch-infer.html>`
@@ -124,7 +124,7 @@ Observability is built into the different MLRun objects (data, functions, jobs,
 , {octicon}`code-square` **Tutorials:**
 {bdg-link-primary}`Model monitoring & drift detection <./tutorial/05-model-monitoring.html>`
 
-
+`````
 
 <a id="core-components"></a>
 ## MLRun core components

diff --git a/docs/monitoring/initial-setup-configuration.ipynb b/docs/monitoring/initial-setup-configuration.ipynb
@@ -8,6 +8,7 @@
     }
    },
    "source": [
+    "(enable-model-monitoring)=\n",
     "# Enable model monitoring\n",
     "\n",
     "```{note}\n",
@@ -16,17 +17,28 @@
     "\n",
     "To see tracking results, model monitoring needs to be enabled in each model.\n",
     "\n",
-    "To enable model monitoring, include `serving_fn.set_tracking()` in the model server.\n",
-    "\n",
     "To utilize drift measurement, supply the train set in the training step.\n",
     "\n",
     "**In this section**\n",
+    "- [Enabling model monitoring](#enabling-model-monitoring)\n",
     "- [Model monitoring demo](#model-monitoring-demo)\n",
     "  - [Deploy model servers](#deploy-model-servers)\n",
     "  - [Simulating requests](#simulating-requests)\n",
     "\n",
+    "## Enabling model monitoring\n",
+    "\n",
+    "Model activities can be tracked into a real-time stream and time-series DB. The monitoring data\n",
+    "is used to create real-time dashboards and track model accuracy and drift. \n",
+    "To set the tracking stream options, specify the following function spec attributes:\n",
+    "    \n",
+    "   `fn.set_tracking(stream_path, batch, sample)`\n",
+    "    \n",
+    "- **stream_path** &mdash; the v3io stream path (e.g. `v3io:///users/..`)\n",
+    "- **sample** &mdash; optional, sample every N requests\n",
+    "- **batch** &mdash; optional, send micro-batches every N requests\n",
+    "    \n",
     "## Model monitoring demo\n",
-    "Use the following code blocks to test and explore model monitoring."
+    "Use the following code to test and explore model monitoring."
    ]
   },
   {

diff --git a/docs/monitoring/model-monitoring-deployment.ipynb b/docs/monitoring/model-monitoring-deployment.ipynb
@@ -80,8 +80,8 @@
     "* **Model** &mdash; user defined name for the model\n",
     "* **Labels** &mdash; user configurable tags that are searchable\n",
     "* **Uptime** &mdash; first request for production data\n",
-    "* **Last Prediction **&mdash; most recent request for production data\n",
-    "* **Error Count **&mdash; includes prediction process errors such as operational issues (For example, a function in a failed state), as well as data processing errors\n",
+    "* **Last Prediction** &mdash; most recent request for production data\n",
+    "* **Error Count** &mdash; includes prediction process errors such as operational issues (For example, a function in a failed state), as well as data processing errors\n",
     "(For example, invalid timestamps, request ids, type mismatches etc.)\n",
     "* **Drift** &mdash; indication of drift status (no drift (green), possible drift (yellow), drift detected (red))\n",
     "* **Accuracy** &mdash; a numeric value representing the accuracy of model predictions (N/A)\n",

diff --git a/docs/runtimes/images.md b/docs/runtimes/images.md
@@ -26,5 +26,5 @@ These characteristics are great when you’re working in a POC or development en
 
 ### Working with images in production
 For production you should create your own images to ensure that the image is fixed.
-- Pin the image tag, e.g. `image="mlrun/mlrun:1.2.0"`. This maintains the image tag at 1.1.0 even when the client is upgraded. Otherwise, an upgrade of the client would also upgrade the image. (If you specify an external (not MLRun images) docker image, like python, the result is the docker/k8s default behavior, which defaults to `latest` when the tag is not provided.)
+- Pin the image tag, e.g. `image="mlrun/mlrun:1.2.0"`. This maintains the image tag at the version you specified, even when the client is upgraded. Otherwise, an upgrade of the client would also upgrade the image. (If you specify an external (not MLRun images) docker image, like python, the result is the docker/k8s default behavior, which defaults to `latest` when the tag is not provided.)
 - Pin the versions of requirements, again to avoid breakages, e.g. `pandas==1.4.0`. (If you only specify the package name, e.g. pandas, then pip/conda (python's package managers) just pick up the latest version.)