[Docs] Add handler decorator (mlrun#2493)

Penumbra69 · Oct 22, 2022 · d4cd9bd · d4cd9bd
1 parent 85c0eda
commit d4cd9bd
Show file tree

Hide file tree

Showing 25 changed files with 183 additions and 56 deletions.
diff --git a/docs/concepts/decorators-and-auto-logging.md b/docs/concepts/decorators-and-auto-logging.md
@@ -0,0 +1,126 @@
+(decorators-and-auto-logging)=
+# Decorators and auto-logging
+
+While it is possible to log results and artifacts using {ref}`the MLRun execution context<mlrun-execution-context>`, it is often more convenient to use the {py:func}`mlrun.handler` decorator.
+
+## Basic example
+
+Assume you have the following code in `train.py`
+
+``` python
+import pandas as pd
+from sklearn.svm import SVC
+
+def train_and_predict(train_data,
+                      predict_input,
+                      label_column='label'):
+
+    x = train_data.drop(label_column, axis=1)
+    y = train_data[label_column]
+
+    clf = SVC()
+    clf.fit(x, y)
+
+    return list(clf.predict(predict_input))
+```
+
+With the `mlrun.handler` the python function itself would not change, and logging of the inputs and outputs would be automatic. The resultant code is as follows:
+
+``` python
+import pandas as pd
+from sklearn.svm import SVC
+import mlrun
+
+@mlrun.handler(labels={'framework':'scikit-learn'},
+               outputs=['prediction:dataset'],
+               inputs={"train_data": pd.DataFrame,
+                       "predict_input": pd.DataFrame})
+def train_and_predict(train_data,
+                      predict_input,
+                      label_column='label'):
+
+    x = train_data.drop(label_column, axis=1)
+    y = train_data[label_column]
+
+    clf = SVC()
+    clf.fit(x, y)
+
+    return list(clf.predict(predict_input))
+```
+
+To run the code, use the following example:
+
+``` python
+import mlrun
+project = mlrun.get_or_create_project("mlrun-example", context="./", user_project=True)
+
+trainer = project.set_function("train.py", name="train_and_predict", kind="job", image="mlrun/mlrun", handler="train_and_predict")
+
+trainer_run = project.run_function(
+    "train_and_predict", 
+    inputs={"train_data": mlrun.get_sample_path('data/iris/iris_dataset.csv'),
+            "predict_input": mlrun.get_sample_path('data/iris/iris_to_predict.csv')
+           }
+)
+```
+
+The outcome is a run with:
+1. A label with key "framework" and value "scikit-learn".
+2. Two inputs "train_data" and "predict_input" created from Pandas DataFrame.
+3. An artifact called "prediction" of type "dataset". The contents of the dataset will be the return value (in this case the prediction result).
+
+## Labels
+
+The decorator gives you the option to set labels for the run. The `labels` parameter is a dictionary with keys and values to set for the labels.
+
+## Input type parsing
+
+The `mlrun.handler` decorator can also parse the input types, if they are specified. An equivalent definition is as follows:
+
+``` python
+@mlrun.handler(labels={'framework':'scikit-learn'},
+               outputs=['prediction:dataset'])
+def train_and_predict(train_data: pd.DataFrame,
+                      predict_input: pd.DataFrame,
+                      label_column='label'):
+
+...
+```
+
+> **Note:** If the inputs does not have a type input, the decorator assumes the parameter type in {py:class}`mlrun.datastore.DataItem`. If you specify `inputs=False`, all the run inputs are assumed to be of type `mlrun.datastore.DataItem`. You also have the option to specify a dictionary where each key is the name of the input and the value is the type.
+
+## Logging return values as artifacts
+
+If you specify the `outputs` parameter, the return values will be logged as the run artifacts. `outputs` expects a list; the length of the list must match the number of returned values.
+
+The simplest option is to specify a list of strings. Each string contains the name of the artifact. You can also specify the artifact type by adding a colon after the artifact name followed by the type (`'name:artifact_type'`). The following are valid artifact types:
+
+- dataset
+- directory
+- file
+- object
+- plot
+- result
+
+If you use only the name without the type, the following mapping is used:
+
+| Python type              | Artifact type |
+|--------------------------|---------------|
+| pandas.DataFrame         | Dataset       |
+| pandas.Series            | Dataset       |
+| numpy.ndarray            | Dataset       |
+| dict                     | Result        |
+| list                     | Result        |
+| tuple                    | Result        |
+| str                      | Result        |
+| int                      | Result        |
+| float                    | Result        |
+| bytes                    | Object        |
+| bytearray                | Object        |
+| matplotlib.pyplot.Figure | Plot          |
+| plotly.graph_objs.Figure | Plot          |
+| bokeh.plotting.Figure    | Plot          |
+
+
+Another option is to specify a tuple in the form of `(name, artifact_type)` or `(name, artifact_type, arguments)`. Refer to the {py:func}`mlrun.handler` for more details.
+
diff --git a/docs/concepts/runs-workflows.md b/docs/concepts/runs-workflows.md
@@ -7,8 +7,9 @@
 ```{toctree}
 :maxdepth: 1
 
-../concepts/mlrun-execution-context
-../concepts/submitting-tasks-jobs-to-functions
-../concepts/workflow-overview
-../concepts/scheduled-jobs
+mlrun-execution-context
+decorators-and-auto-logging
+submitting-tasks-jobs-to-functions
+workflow-overview
+scheduled-jobs
 ```
diff --git a/mlrun/frameworks/_common/model_handler.py b/mlrun/frameworks/_common/model_handler.py
@@ -473,7 +473,7 @@ def save(
         Save the handled model at the given output path.
 
         :param output_path:  The full path to the directory to save the handled model at. If not given, the context
-                             stored will be used to save the model in the defaulted artifacts location.
+                             stored will be used to save the model in the default artifacts location.
 
         :return The saved model artifacts dictionary if context is available and None otherwise.
 
@@ -517,8 +517,8 @@ def to_onnx(self, model_name: str = None, optimize: bool = True, **kwargs):
 
         :param model_name: The name to give to the converted ONNX model. If not given the default name will be the
                            stored model name with the suffix '_onnx'.
-        :param optimize:   Whether to optimize the ONNX model using 'onnxoptimizer' before saving the model. Defaulted
-                           to True.
+        :param optimize:   Whether to optimize the ONNX model using 'onnxoptimizer' before saving the model. Default:
+                           True.
 
         :return: The converted ONNX model (onnx.ModelProto).
         """

diff --git a/mlrun/frameworks/_ml_common/plan.py b/mlrun/frameworks/_ml_common/plan.py
@@ -53,7 +53,7 @@ def __init__(self, need_probabilities: bool = False):
         Initialize a new ML plan.
 
         :param need_probabilities: Whether this plan will need the predictions return from 'model.predict()' or
-                                   'model.predict_proba()'. True means predict_proba and False predict. Defaulted to
+                                   'model.predict_proba()'. True means predict_proba and False predict. Default:
                                    False.
         """
         self._need_probabilities = need_probabilities

diff --git a/mlrun/frameworks/_ml_common/plans/calibration_curve_plan.py b/mlrun/frameworks/_ml_common/plans/calibration_curve_plan.py
@@ -47,7 +47,7 @@ def __init__(
                           proper probability.
         :param n_bins:    Number of bins to discretize the [0, 1] interval.
         :param strategy:  Strategy used to define the widths of the bins. Can be on of {‘uniform’, ‘quantile’}.
-                          Defaulted to "uniform".
+                          Default: "uniform".
         """
         # Store the parameters:
         self._normalize = normalize

diff --git a/mlrun/frameworks/lgbm/__init__.py b/mlrun/frameworks/lgbm/__init__.py
@@ -275,7 +275,7 @@ def apply_mlrun(
     :param parameters:               Parameters to log with the model.
     :param extra_data:               Extra data to log with the model.
     :param auto_log:                 Whether to apply MLRun's auto logging on the model. Auto logging will add the
-                                     default artifacts and metrics to the lists of artifacts and metrics. Defaulted to
+                                     default artifacts and metrics to the lists of artifacts and metrics. Default:
                                      True.
     :param mlrun_logging_callback_kwargs: Key word arguments for the MLRun callback. For further information see the
                                      documentation of the class 'MLRunLoggingCallback'. Note that 'context' is already

diff --git a/mlrun/frameworks/lgbm/callbacks/callback.py b/mlrun/frameworks/lgbm/callbacks/callback.py
@@ -24,7 +24,7 @@ class Callback(ABC):
 
     There are two configurable class properties:
 
-    * order: int = 10 - The priority of the callback to be called first. Lower value means higher priority. Defaulted to
+    * order: int = 10 - The priority of the callback to be called first. Lower value means higher priority. Default:
       10.
     * before_iteration: bool = False - Whether to call this callback before each iteration or after. Default: after
       (False).
@@ -75,7 +75,7 @@ def __init__(self, order: int = 10, before_iteration: bool = False):
         Initialize a new callback to use in LightGBM's training.
 
         :param order:            The priority of the callback to be called first. Lower value means higher priority.
-                                 Defaulted to 10.
+                                 Default: 10.
         :param before_iteration: Whether to call this callback before each iteration or after. Default: after
                                  (False).
         """

diff --git a/mlrun/frameworks/lgbm/callbacks/mlrun_logging_callback.py b/mlrun/frameworks/lgbm/callbacks/mlrun_logging_callback.py
@@ -53,7 +53,7 @@ def __init__(
                                         the `params` dictionary.
         :param logging_frequency:       Per how many iterations to write the logs to MLRun (create the plots and log
                                         them and the results to MLRun). Two low frequency may slow the training time.
-                                        Defaulted to 100.
+                                        Default: 100.
         """
         super(MLRunLoggingCallback, self).__init__(
             dynamic_hyperparameters=dynamic_hyperparameters,

diff --git a/mlrun/frameworks/lgbm/model_handler.py b/mlrun/frameworks/lgbm/model_handler.py
@@ -120,7 +120,7 @@ def __init__(
                                          model.
         :param context:                  MLRun context to work with for logging the model.
         :param model_format:             The format to use for saving and loading the model. Should be passed as a
-                                         member of the class 'LGBMModelHandler.ModelFormats'. Defaulted to
+                                         member of the class 'LGBMModelHandler.ModelFormats'. Default:
                                          'LGBMModelHandler.ModelFormats.PKL'.
 
         :raise MLRunInvalidArgumentError: In case one of the given parameters are invalid.
@@ -189,7 +189,7 @@ def save(self, output_path: str = None, **kwargs):
         logged and returned as artifacts.
 
         :param output_path: The full path to the directory to save the handled model at. If not given, the context
-                            stored will be used to save the model in the defaulted artifacts location.
+                            stored will be used to save the model in the default artifacts location.
 
         :return The saved model additional artifacts (if needed) dictionary if context is available and None otherwise.
         """
@@ -229,7 +229,7 @@ def to_onnx(
         :param model_name:          The name to give to the converted ONNX model. If not given the default name will be
                                     the stored model name with the suffix '_onnx'.
         :param optimize:            Whether to optimize the ONNX model using 'onnxoptimizer' before saving the model.
-                                    Defaulted to True.
+                                    Default: True.
         :param input_sample:        An inputs sample with the names and data types of the inputs of the model.
         :param log:                 In order to log the ONNX model, pass True. If None, the model will be logged if this
                                     handler has a MLRun context set. Default: None.

diff --git a/mlrun/frameworks/onnx/model_handler.py b/mlrun/frameworks/onnx/model_handler.py
@@ -77,7 +77,7 @@ def save(
         logged and returned as artifacts.
 
         :param output_path: The full path to the directory to save the handled model at. If not given, the context
-                            stored will be used to save the model in the defaulted artifacts location.
+                            stored will be used to save the model in the default artifacts location.
 
         :return The saved model additional artifacts (if needed) dictionary if context is available and None otherwise.
         """
@@ -110,8 +110,8 @@ def optimize(self, optimizations: List[str] = None, fixed_point: bool = False):
         Use ONNX optimizer to optimize the ONNX model. The optimizations supported can be seen by calling
         'onnxoptimizer.get_available_passes()'
 
-        :param optimizations: List of possible optimizations. If None, all of the optimizations will be used. Defaulted
-                              to None.
+        :param optimizations: List of possible optimizations. If None, all of the optimizations will be used. Default:
+                              None.
         :param fixed_point:   Optimize the weights using fixed point. Default: False.
         """
         # Set the ONNX optimizations list:

diff --git a/mlrun/frameworks/onnx/model_server.py b/mlrun/frameworks/onnx/model_server.py
@@ -72,7 +72,7 @@ def __init__(
                                         ),
                                         'CPUExecutionProvider'
                                     ]
-                                    Defaulted to None - will prefer CUDA Execution Provider over CPU Execution Provider.
+                                    Default: None - will prefer CUDA Execution Provider over CPU Execution Provider.
         :param protocol:            -
         :param class_args:          -
         """

diff --git a/mlrun/frameworks/pytorch/__init__.py b/mlrun/frameworks/pytorch/__init__.py
@@ -70,20 +70,20 @@ def train(
     :param scheduler_step_frequency:    The frequency in which to step the given scheduler. Can be equal to one of the
                                         strings 'epoch' (for at the end of every epoch) and 'batch' (for at the end of
                                         every batch), or an integer that specify per how many iterations to step or a
-                                        float percentage (0.0 < x < 1.0) for per x / iterations to step. Defaulted to
+                                        float percentage (0.0 < x < 1.0) for per x / iterations to step. Default:
                                         'epoch'.
     :param epochs:                      Amount of epochs to perform. Default: a single epoch.
     :param training_iterations:         Amount of iterations (batches) to perform on each epoch's training. If 'None'
                                         the entire training set will be used.
     :param validation_iterations:       Amount of iterations (batches) to perform on each epoch's validation. If 'None'
                                         the entire validation set will be used.
     :param callbacks_list:              The callbacks to use on this run.
-    :param use_cuda:                    Whether or not to use cuda. Only relevant if cuda is available. Defaulted to
+    :param use_cuda:                    Whether or not to use cuda. Only relevant if cuda is available. Default:
                                         True.
-    :param use_horovod:                 Whether or not to use horovod - a distributed training framework. Defaulted to
+    :param use_horovod:                 Whether or not to use horovod - a distributed training framework. Default:
                                         False.
-    :param auto_log:                    Whether or not to apply auto-logging (to both MLRun and Tensorboard). Defaulted
-                                        to True. IF True, the custom objects are not optional.
+    :param auto_log:                    Whether or not to apply auto-logging (to both MLRun and Tensorboard). Default:
+                                        True. IF True, the custom objects are not optional.
     :param model_name:                  The model name to use for storing the model artifact. If not given, the model's
                                         class name will be used.
     :param modules_map:                 A dictionary of all the modules required for loading the model. Each key is a
@@ -234,7 +234,7 @@ def evaluate(
                                      dataset will be used.
     :param callbacks_list:           The callbacks to use on this run.
     :param use_cuda:                 Whether or not to use cuda. Only relevant if cuda is available. Default: True.
-    :param use_horovod:              Whether or not to use horovod - a distributed training framework. Defaulted to
+    :param use_horovod:              Whether or not to use horovod - a distributed training framework. Default:
                                      False.
     :param auto_log:                 Whether or not to apply auto-logging to MLRun. Default: True.
     :param model_name:               The model name to use for storing the model artifact. If not given, the model's

diff --git a/mlrun/frameworks/pytorch/callbacks/tensorboard_logging_callback.py b/mlrun/frameworks/pytorch/callbacks/tensorboard_logging_callback.py
@@ -281,7 +281,7 @@ def __init__(
                                         epoch, the weights names should be passed here. Note that each name given will
                                         be searched as 'if <NAME> in <WEIGHT_NAME>' so a simple module name will be
                                         enough to catch his weights. A boolean value can be passed to track all weights.
-                                        Defaulted to False.
+                                        Default: False.
         :param statistics_functions:    A list of statistics functions to calculate at the end of each epoch on the
                                         tracked weights. Only relevant if weights are being tracked. The functions in
                                         the list must accept one Parameter (or Tensor) and return a float (or float