Skip to content

Commit

Permalink
[Docs] Add handler decorator (mlrun#2493)
Browse files Browse the repository at this point in the history
  • Loading branch information
gilad-shaham authored Oct 22, 2022
1 parent 85c0eda commit d4cd9bd
Show file tree
Hide file tree
Showing 25 changed files with 183 additions and 56 deletions.
126 changes: 126 additions & 0 deletions docs/concepts/decorators-and-auto-logging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
(decorators-and-auto-logging)=
# Decorators and auto-logging

While it is possible to log results and artifacts using {ref}`the MLRun execution context<mlrun-execution-context>`, it is often more convenient to use the {py:func}`mlrun.handler` decorator.

## Basic example

Assume you have the following code in `train.py`

``` python
import pandas as pd
from sklearn.svm import SVC

def train_and_predict(train_data,
predict_input,
label_column='label'):

x = train_data.drop(label_column, axis=1)
y = train_data[label_column]

clf = SVC()
clf.fit(x, y)

return list(clf.predict(predict_input))
```

With the `mlrun.handler` the python function itself would not change, and logging of the inputs and outputs would be automatic. The resultant code is as follows:

``` python
import pandas as pd
from sklearn.svm import SVC
import mlrun

@mlrun.handler(labels={'framework':'scikit-learn'},
outputs=['prediction:dataset'],
inputs={"train_data": pd.DataFrame,
"predict_input": pd.DataFrame})
def train_and_predict(train_data,
predict_input,
label_column='label'):

x = train_data.drop(label_column, axis=1)
y = train_data[label_column]

clf = SVC()
clf.fit(x, y)

return list(clf.predict(predict_input))
```

To run the code, use the following example:

``` python
import mlrun
project = mlrun.get_or_create_project("mlrun-example", context="./", user_project=True)

trainer = project.set_function("train.py", name="train_and_predict", kind="job", image="mlrun/mlrun", handler="train_and_predict")

trainer_run = project.run_function(
"train_and_predict",
inputs={"train_data": mlrun.get_sample_path('data/iris/iris_dataset.csv'),
"predict_input": mlrun.get_sample_path('data/iris/iris_to_predict.csv')
}
)
```

The outcome is a run with:
1. A label with key "framework" and value "scikit-learn".
2. Two inputs "train_data" and "predict_input" created from Pandas DataFrame.
3. An artifact called "prediction" of type "dataset". The contents of the dataset will be the return value (in this case the prediction result).

## Labels

The decorator gives you the option to set labels for the run. The `labels` parameter is a dictionary with keys and values to set for the labels.

## Input type parsing

The `mlrun.handler` decorator can also parse the input types, if they are specified. An equivalent definition is as follows:

``` python
@mlrun.handler(labels={'framework':'scikit-learn'},
outputs=['prediction:dataset'])
def train_and_predict(train_data: pd.DataFrame,
predict_input: pd.DataFrame,
label_column='label'):

...
```

> **Note:** If the inputs does not have a type input, the decorator assumes the parameter type in {py:class}`mlrun.datastore.DataItem`. If you specify `inputs=False`, all the run inputs are assumed to be of type `mlrun.datastore.DataItem`. You also have the option to specify a dictionary where each key is the name of the input and the value is the type.
## Logging return values as artifacts

If you specify the `outputs` parameter, the return values will be logged as the run artifacts. `outputs` expects a list; the length of the list must match the number of returned values.

The simplest option is to specify a list of strings. Each string contains the name of the artifact. You can also specify the artifact type by adding a colon after the artifact name followed by the type (`'name:artifact_type'`). The following are valid artifact types:

- dataset
- directory
- file
- object
- plot
- result

If you use only the name without the type, the following mapping is used:

| Python type | Artifact type |
|--------------------------|---------------|
| pandas.DataFrame | Dataset |
| pandas.Series | Dataset |
| numpy.ndarray | Dataset |
| dict | Result |
| list | Result |
| tuple | Result |
| str | Result |
| int | Result |
| float | Result |
| bytes | Object |
| bytearray | Object |
| matplotlib.pyplot.Figure | Plot |
| plotly.graph_objs.Figure | Plot |
| bokeh.plotting.Figure | Plot |


Another option is to specify a tuple in the form of `(name, artifact_type)` or `(name, artifact_type, arguments)`. Refer to the {py:func}`mlrun.handler` for more details.

9 changes: 5 additions & 4 deletions docs/concepts/runs-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
```{toctree}
:maxdepth: 1
../concepts/mlrun-execution-context
../concepts/submitting-tasks-jobs-to-functions
../concepts/workflow-overview
../concepts/scheduled-jobs
mlrun-execution-context
decorators-and-auto-logging
submitting-tasks-jobs-to-functions
workflow-overview
scheduled-jobs
```
6 changes: 3 additions & 3 deletions mlrun/frameworks/_common/model_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -473,7 +473,7 @@ def save(
Save the handled model at the given output path.
:param output_path: The full path to the directory to save the handled model at. If not given, the context
stored will be used to save the model in the defaulted artifacts location.
stored will be used to save the model in the default artifacts location.
:return The saved model artifacts dictionary if context is available and None otherwise.
Expand Down Expand Up @@ -517,8 +517,8 @@ def to_onnx(self, model_name: str = None, optimize: bool = True, **kwargs):
:param model_name: The name to give to the converted ONNX model. If not given the default name will be the
stored model name with the suffix '_onnx'.
:param optimize: Whether to optimize the ONNX model using 'onnxoptimizer' before saving the model. Defaulted
to True.
:param optimize: Whether to optimize the ONNX model using 'onnxoptimizer' before saving the model. Default:
True.
:return: The converted ONNX model (onnx.ModelProto).
"""
Expand Down
2 changes: 1 addition & 1 deletion mlrun/frameworks/_ml_common/plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def __init__(self, need_probabilities: bool = False):
Initialize a new ML plan.
:param need_probabilities: Whether this plan will need the predictions return from 'model.predict()' or
'model.predict_proba()'. True means predict_proba and False predict. Defaulted to
'model.predict_proba()'. True means predict_proba and False predict. Default:
False.
"""
self._need_probabilities = need_probabilities
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def __init__(
proper probability.
:param n_bins: Number of bins to discretize the [0, 1] interval.
:param strategy: Strategy used to define the widths of the bins. Can be on of {‘uniform’, ‘quantile’}.
Defaulted to "uniform".
Default: "uniform".
"""
# Store the parameters:
self._normalize = normalize
Expand Down
2 changes: 1 addition & 1 deletion mlrun/frameworks/lgbm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ def apply_mlrun(
:param parameters: Parameters to log with the model.
:param extra_data: Extra data to log with the model.
:param auto_log: Whether to apply MLRun's auto logging on the model. Auto logging will add the
default artifacts and metrics to the lists of artifacts and metrics. Defaulted to
default artifacts and metrics to the lists of artifacts and metrics. Default:
True.
:param mlrun_logging_callback_kwargs: Key word arguments for the MLRun callback. For further information see the
documentation of the class 'MLRunLoggingCallback'. Note that 'context' is already
Expand Down
4 changes: 2 additions & 2 deletions mlrun/frameworks/lgbm/callbacks/callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class Callback(ABC):
There are two configurable class properties:
* order: int = 10 - The priority of the callback to be called first. Lower value means higher priority. Defaulted to
* order: int = 10 - The priority of the callback to be called first. Lower value means higher priority. Default:
10.
* before_iteration: bool = False - Whether to call this callback before each iteration or after. Default: after
(False).
Expand Down Expand Up @@ -75,7 +75,7 @@ def __init__(self, order: int = 10, before_iteration: bool = False):
Initialize a new callback to use in LightGBM's training.
:param order: The priority of the callback to be called first. Lower value means higher priority.
Defaulted to 10.
Default: 10.
:param before_iteration: Whether to call this callback before each iteration or after. Default: after
(False).
"""
Expand Down
2 changes: 1 addition & 1 deletion mlrun/frameworks/lgbm/callbacks/mlrun_logging_callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def __init__(
the `params` dictionary.
:param logging_frequency: Per how many iterations to write the logs to MLRun (create the plots and log
them and the results to MLRun). Two low frequency may slow the training time.
Defaulted to 100.
Default: 100.
"""
super(MLRunLoggingCallback, self).__init__(
dynamic_hyperparameters=dynamic_hyperparameters,
Expand Down
6 changes: 3 additions & 3 deletions mlrun/frameworks/lgbm/model_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def __init__(
model.
:param context: MLRun context to work with for logging the model.
:param model_format: The format to use for saving and loading the model. Should be passed as a
member of the class 'LGBMModelHandler.ModelFormats'. Defaulted to
member of the class 'LGBMModelHandler.ModelFormats'. Default:
'LGBMModelHandler.ModelFormats.PKL'.
:raise MLRunInvalidArgumentError: In case one of the given parameters are invalid.
Expand Down Expand Up @@ -189,7 +189,7 @@ def save(self, output_path: str = None, **kwargs):
logged and returned as artifacts.
:param output_path: The full path to the directory to save the handled model at. If not given, the context
stored will be used to save the model in the defaulted artifacts location.
stored will be used to save the model in the default artifacts location.
:return The saved model additional artifacts (if needed) dictionary if context is available and None otherwise.
"""
Expand Down Expand Up @@ -229,7 +229,7 @@ def to_onnx(
:param model_name: The name to give to the converted ONNX model. If not given the default name will be
the stored model name with the suffix '_onnx'.
:param optimize: Whether to optimize the ONNX model using 'onnxoptimizer' before saving the model.
Defaulted to True.
Default: True.
:param input_sample: An inputs sample with the names and data types of the inputs of the model.
:param log: In order to log the ONNX model, pass True. If None, the model will be logged if this
handler has a MLRun context set. Default: None.
Expand Down
6 changes: 3 additions & 3 deletions mlrun/frameworks/onnx/model_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def save(
logged and returned as artifacts.
:param output_path: The full path to the directory to save the handled model at. If not given, the context
stored will be used to save the model in the defaulted artifacts location.
stored will be used to save the model in the default artifacts location.
:return The saved model additional artifacts (if needed) dictionary if context is available and None otherwise.
"""
Expand Down Expand Up @@ -110,8 +110,8 @@ def optimize(self, optimizations: List[str] = None, fixed_point: bool = False):
Use ONNX optimizer to optimize the ONNX model. The optimizations supported can be seen by calling
'onnxoptimizer.get_available_passes()'
:param optimizations: List of possible optimizations. If None, all of the optimizations will be used. Defaulted
to None.
:param optimizations: List of possible optimizations. If None, all of the optimizations will be used. Default:
None.
:param fixed_point: Optimize the weights using fixed point. Default: False.
"""
# Set the ONNX optimizations list:
Expand Down
2 changes: 1 addition & 1 deletion mlrun/frameworks/onnx/model_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def __init__(
),
'CPUExecutionProvider'
]
Defaulted to None - will prefer CUDA Execution Provider over CPU Execution Provider.
Default: None - will prefer CUDA Execution Provider over CPU Execution Provider.
:param protocol: -
:param class_args: -
"""
Expand Down
12 changes: 6 additions & 6 deletions mlrun/frameworks/pytorch/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,20 +70,20 @@ def train(
:param scheduler_step_frequency: The frequency in which to step the given scheduler. Can be equal to one of the
strings 'epoch' (for at the end of every epoch) and 'batch' (for at the end of
every batch), or an integer that specify per how many iterations to step or a
float percentage (0.0 < x < 1.0) for per x / iterations to step. Defaulted to
float percentage (0.0 < x < 1.0) for per x / iterations to step. Default:
'epoch'.
:param epochs: Amount of epochs to perform. Default: a single epoch.
:param training_iterations: Amount of iterations (batches) to perform on each epoch's training. If 'None'
the entire training set will be used.
:param validation_iterations: Amount of iterations (batches) to perform on each epoch's validation. If 'None'
the entire validation set will be used.
:param callbacks_list: The callbacks to use on this run.
:param use_cuda: Whether or not to use cuda. Only relevant if cuda is available. Defaulted to
:param use_cuda: Whether or not to use cuda. Only relevant if cuda is available. Default:
True.
:param use_horovod: Whether or not to use horovod - a distributed training framework. Defaulted to
:param use_horovod: Whether or not to use horovod - a distributed training framework. Default:
False.
:param auto_log: Whether or not to apply auto-logging (to both MLRun and Tensorboard). Defaulted
to True. IF True, the custom objects are not optional.
:param auto_log: Whether or not to apply auto-logging (to both MLRun and Tensorboard). Default:
True. IF True, the custom objects are not optional.
:param model_name: The model name to use for storing the model artifact. If not given, the model's
class name will be used.
:param modules_map: A dictionary of all the modules required for loading the model. Each key is a
Expand Down Expand Up @@ -234,7 +234,7 @@ def evaluate(
dataset will be used.
:param callbacks_list: The callbacks to use on this run.
:param use_cuda: Whether or not to use cuda. Only relevant if cuda is available. Default: True.
:param use_horovod: Whether or not to use horovod - a distributed training framework. Defaulted to
:param use_horovod: Whether or not to use horovod - a distributed training framework. Default:
False.
:param auto_log: Whether or not to apply auto-logging to MLRun. Default: True.
:param model_name: The model name to use for storing the model artifact. If not given, the model's
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ def __init__(
epoch, the weights names should be passed here. Note that each name given will
be searched as 'if <NAME> in <WEIGHT_NAME>' so a simple module name will be
enough to catch his weights. A boolean value can be passed to track all weights.
Defaulted to False.
Default: False.
:param statistics_functions: A list of statistics functions to calculate at the end of each epoch on the
tracked weights. Only relevant if weights are being tracked. The functions in
the list must accept one Parameter (or Tensor) and return a float (or float
Expand Down
Loading

0 comments on commit d4cd9bd

Please sign in to comment.