Skip to content

Commit

Permalink
Merge pull request #1109 from alan-turing-institute/cheatsheet-and-wo…
Browse files Browse the repository at this point in the history
…rkflow-doc-updates

Update cheatsheet and workflow docs
  • Loading branch information
ablaom authored Apr 24, 2024
2 parents 08205f7 + e28494e commit ddb2142
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 59 deletions.
52 changes: 38 additions & 14 deletions docs/src/common_mlj_workflows.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,30 @@
# Common MLJ Workflows

This demo assumes you have certain packages in your active [package
environment](https://docs.julialang.org/en/v1/stdlib/Pkg/). To activate a new environment,
"MyNewEnv", with just these packages, do this in a new REPL session:

```julia
using Pkg
Pkg.activate("MyNewEnv")
Pkg.add(["MLJ", "RDatasets", "DataFrames", "MLJDecisionTreeInterface",
"MLJMultivariateStatsInterface", "NearestNeighborModels", "MLJGLMInterface",
"Plots"])
```

The following starts MLJ and shows the current version of MLJ (you can also use
`Pkg.status()`):

```@example workflows
using MLJ
MLJ_VERSION
```

## Data ingestion

```@setup workflows
# to avoid RDatasets as a doc dependency:
using MLJ; color_off()
color_off()
import DataFrames
channing = (Sex = rand(["Male","Female"], 462),
Entry = rand(Int, 462),
Expand All @@ -14,6 +34,7 @@ channing = (Sex = rand(["Male","Female"], 462),
coerce!(channing, :Sex => Multiclass)
```


```julia
import RDatasets
channing = RDatasets.dataset("boot", "channing")
Expand All @@ -37,7 +58,7 @@ schema(channing)

Horizontally splitting data and shuffling rows.

Here `y` is the `:Exit` column and `X` everything else:
Here `y` is the `:Exit` column and `X` a table with everything else:

```@example workflows
y, X = unpack(channing, ==(:Exit), rng=123);
Expand All @@ -61,7 +82,7 @@ schema(X)
Fixing wrong scientific types in `X`:

```@example workflows
X = coerce(X, :Exit=>Continuous, :Entry=>Continuous, :Cens=>Multiclass)
X = coerce(X, :Exit=>Continuous, :Entry=>Continuous, :Cens=>Multiclass);
schema(X)
```

Expand Down Expand Up @@ -114,7 +135,7 @@ ms[6]
```

```@example workflows
models("Tree");
models("Tree")
```

A more refined search:
Expand Down Expand Up @@ -150,7 +171,10 @@ nothing # hide

## Instantiating a model

*Reference:* [Getting Started](@ref), [Loading Model Code](@ref)
*Reference:* [Getting Started](@ref), [Loading Model Code](@ref)

Assumes `MLJDecisionTreeClassifier` is in your environment. Otherwise, try interactive
loading with `@iload`:

```@example workflows
Tree = @load DecisionTreeClassifier pkg=DecisionTree
Expand All @@ -169,9 +193,8 @@ tree.max_depth = 4

*Reference:* [Evaluating Model Performance](evaluating_model_performance.md)


```@example workflows
X, y = @load_boston
X, y = @load_boston # a table and a vector
KNN = @load KNNRegressor
knn = KNN()
evaluate(knn, X, y,
Expand All @@ -181,7 +204,8 @@ evaluate(knn, X, y,

Note `RootMeanSquaredError()` has alias `rms` and `LPLoss(1)` has aliases `l1`, `mae`.

Do `measures()` to list all losses and scores and their aliases.
Do `measures()` to list all losses and scores and their aliases, or refer to the
StatisticalMeasures.jl [docs](https://juliaai.github.io/StatisticalMeasures.jl/dev/).


## Basic fit/evaluate/predict by hand:
Expand All @@ -197,7 +221,6 @@ schema(crabs)
```@example workflows
y, X = unpack(crabs, ==(:sp), !in([:index, :sex]); rng=123)
Tree = @load DecisionTreeClassifier pkg=DecisionTree
tree = Tree(max_depth=2) # hide
```
Expand Down Expand Up @@ -225,8 +248,6 @@ LogLoss(tol=1e-4)(yhat, y[test])

Note `LogLoss()` has aliases `log_loss` and `cross_entropy`.

Run `measures()` to list all losses and scores and their aliases ("instances").

Predict on the new data set:

```@example workflows
Expand Down Expand Up @@ -316,7 +337,7 @@ report(mach)
Load data:

```@example workflows
X, y = @load_iris
X, y = @load_iris # a table and a vector
train, test = partition(eachindex(y), 0.97, shuffle=true, rng=123)
```

Expand Down Expand Up @@ -435,15 +456,15 @@ plot(mach)

![](img/workflows_tuning_plot.png)

Predicting on new data using the optimized model:
Predicting on new data using the optimized model trained on all data:

```@example workflows
predict(mach, Xnew)
```

## Constructing linear pipelines

*Reference:* [Composing Models](composing_models.md)
*Reference:* [Linear Pipelines](@ref)

Constructing a linear (unbranching) pipeline with a *learned* target
transformation/inverse transformation:
Expand All @@ -452,6 +473,9 @@ transformation/inverse transformation:
X, y = @load_reduced_ames
KNN = @load KNNRegressor
knn_with_target = TransformedTargetModel(model=KNN(K=3), transformer=Standardizer())
```

```@example workflows
pipe = (X -> coerce(X, :age=>Continuous)) |> OneHotEncoder() |> knn_with_target
```

Expand Down
47 changes: 2 additions & 45 deletions docs/src/mlj_cheatsheet.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,49 +293,6 @@ Concatenation:

`pipe1 |> pipe2` or `model |> pipe` or `pipe |> model`, etc

## Advanced model composition techniques

## Define a supervised learning network:

`Xs = source(X)`
`ys = source(y)`

... define further nodal machines and nodes ...

`yhat = predict(knn_machine, W, ys)` (final node)


## Exporting a learning network as a stand-alone model:

Supervised, with final node `yhat` returning point predictions:

```julia
@from_network machine(Deterministic(), Xs, ys; predict=yhat) begin
mutable struct Composite
reducer=network_pca
regressor=network_knn
end
```

Here `network_pca` and `network_knn` are models appearing in the
learning network.

Supervised, with `yhat` final node returning probabilistic predictions:

```julia
@from_network machine(Probabilistic(), Xs, ys; predict=yhat) begin
mutable struct Composite
reducer=network_pca
classifier=network_tree
end
```

Unsupervised, with final node `Xout`:

```julia
@from_network machine(Unsupervised(), Xs; transform=Xout) begin
mutable struct Composite
reducer1=network_pca
reducer2=clusterer
end
end
```UnivariateTimeTypeToContinuous
See the [Composing Models](@ref) section of the MLJ manual.

0 comments on commit ddb2142

Please sign in to comment.