Merge pull request #1109 from alan-turing-institute/cheatsheet-and-wo…

…rkflow-doc-updates Update cheatsheet and workflow docs
JuliaAI · Apr 24, 2024 · ddb2142 · ddb2142
2 parents 08205f7 + e28494e
commit ddb2142
Show file tree

Hide file tree

Showing 2 changed files with 40 additions and 59 deletions.
diff --git a/docs/src/common_mlj_workflows.md b/docs/src/common_mlj_workflows.md
@@ -1,10 +1,30 @@
 # Common MLJ Workflows
 
+This demo assumes you have certain packages in your active [package
+environment](https://docs.julialang.org/en/v1/stdlib/Pkg/). To activate a new environment,
+"MyNewEnv", with just these packages, do this in a new REPL session:
+
+```julia
+using Pkg
+Pkg.activate("MyNewEnv")
+Pkg.add(["MLJ", "RDatasets", "DataFrames", "MLJDecisionTreeInterface",
+    "MLJMultivariateStatsInterface", "NearestNeighborModels", "MLJGLMInterface",
+    "Plots"])
+```
+
+The following starts MLJ and shows the current version of MLJ (you can also use
+`Pkg.status()`):
+
+```@example workflows
+using MLJ
+MLJ_VERSION
+```
+
 ## Data ingestion
 
 ```@setup workflows
 # to avoid RDatasets as a doc dependency:
-using MLJ; color_off()
+color_off()
 import DataFrames
 channing = (Sex = rand(["Male","Female"], 462),
             Entry = rand(Int, 462),
@@ -14,6 +34,7 @@ channing = (Sex = rand(["Male","Female"], 462),
 coerce!(channing, :Sex => Multiclass)
 ```
 
+
 ```julia
 import RDatasets
 channing = RDatasets.dataset("boot", "channing")
@@ -37,7 +58,7 @@ schema(channing)
 
 Horizontally splitting data and shuffling rows.
 
-Here `y` is the `:Exit` column and `X` everything else:
+Here `y` is the `:Exit` column and `X` a table with everything else:
 
 ```@example workflows
 y, X =  unpack(channing, ==(:Exit), rng=123);
@@ -61,7 +82,7 @@ schema(X)
 Fixing wrong scientific types in `X`:
 
 ```@example workflows
-X = coerce(X, :Exit=>Continuous, :Entry=>Continuous, :Cens=>Multiclass)
+X = coerce(X, :Exit=>Continuous, :Entry=>Continuous, :Cens=>Multiclass);
 schema(X)
 ```
 
@@ -114,7 +135,7 @@ ms[6]
 ```
 
 ```@example workflows
-models("Tree");
+models("Tree")
 ```
 
 A more refined search:
@@ -150,7 +171,10 @@ nothing # hide
 
 ## Instantiating a model
 
-*Reference:*   [Getting Started](@ref), [Loading Model Code](@ref)
+    *Reference:*   [Getting Started](@ref), [Loading Model Code](@ref)
+
+Assumes `MLJDecisionTreeClassifier` is in your environment. Otherwise, try interactive
+loading with `@iload`:
 
 ```@example workflows
 Tree = @load DecisionTreeClassifier pkg=DecisionTree
@@ -169,9 +193,8 @@ tree.max_depth = 4
 
 *Reference:*   [Evaluating Model Performance](evaluating_model_performance.md)
 
-
 ```@example workflows
-X, y = @load_boston
+X, y = @load_boston  # a table and a vector
 KNN = @load KNNRegressor
 knn = KNN()
 evaluate(knn, X, y,
@@ -181,7 +204,8 @@ evaluate(knn, X, y,
 
 Note `RootMeanSquaredError()` has alias `rms` and `LPLoss(1)` has aliases `l1`, `mae`.
 
-Do `measures()` to list all losses and scores and their aliases.
+Do `measures()` to list all losses and scores and their aliases, or refer to the
+StatisticalMeasures.jl [docs](https://juliaai.github.io/StatisticalMeasures.jl/dev/).
 
 
 ##  Basic fit/evaluate/predict by hand:
@@ -197,7 +221,6 @@ schema(crabs)
 ```@example workflows
 y, X = unpack(crabs, ==(:sp), !in([:index, :sex]); rng=123)
 
-
 Tree = @load DecisionTreeClassifier pkg=DecisionTree
 tree = Tree(max_depth=2) # hide
 ```
@@ -225,8 +248,6 @@ LogLoss(tol=1e-4)(yhat, y[test])
 
 Note `LogLoss()` has aliases `log_loss` and `cross_entropy`.
 
-Run `measures()` to list all losses and scores and their aliases ("instances").
-
 Predict on the new data set:
 
 ```@example workflows
@@ -316,7 +337,7 @@ report(mach)
 Load data:
 
 ```@example workflows
-X, y = @load_iris
+X, y = @load_iris  # a table and a vector
 train, test = partition(eachindex(y), 0.97, shuffle=true, rng=123)
 ```
 
@@ -435,15 +456,15 @@ plot(mach)
 
 ![](img/workflows_tuning_plot.png)
 
-Predicting on new data using the optimized model:
+Predicting on new data using the optimized model trained on all data:
 
 ```@example workflows
 predict(mach, Xnew)
 ```
 
 ## Constructing linear pipelines
 
-*Reference:*   [Composing Models](composing_models.md)
+*Reference:*   [Linear Pipelines](@ref)
 
 Constructing a linear (unbranching) pipeline with a *learned* target
 transformation/inverse transformation:
@@ -452,6 +473,9 @@ transformation/inverse transformation:
 X, y = @load_reduced_ames
 KNN = @load KNNRegressor
 knn_with_target = TransformedTargetModel(model=KNN(K=3), transformer=Standardizer())
+```
+
+```@example workflows
 pipe = (X -> coerce(X, :age=>Continuous)) |> OneHotEncoder() |> knn_with_target
 ```
 

diff --git a/docs/src/mlj_cheatsheet.md b/docs/src/mlj_cheatsheet.md
@@ -293,49 +293,6 @@ Concatenation:
 
 `pipe1 |> pipe2` or `model |> pipe` or `pipe |> model`, etc
 
+## Advanced model composition techniques
 
-## Define a supervised learning network:
-
-`Xs = source(X)`
-`ys = source(y)`
-
-... define further nodal machines and nodes ...
-
-`yhat = predict(knn_machine, W, ys)` (final node)
-
-
-## Exporting a learning network as a stand-alone model:
-
-Supervised, with final node `yhat` returning point predictions:
-
-```julia
-@from_network machine(Deterministic(), Xs, ys; predict=yhat) begin
-    mutable struct Composite
-	    reducer=network_pca
-		regressor=network_knn
-    end
-```
-
-Here `network_pca` and `network_knn` are models appearing in the
-learning network.
-
-Supervised, with `yhat` final node returning probabilistic predictions:
-
-```julia
-@from_network machine(Probabilistic(), Xs, ys; predict=yhat) begin
-    mutable struct Composite
-        reducer=network_pca
-        classifier=network_tree
-    end
-```
-
-Unsupervised, with final node `Xout`:
-
-```julia
-@from_network machine(Unsupervised(), Xs; transform=Xout) begin
-    mutable struct Composite
-	    reducer1=network_pca
-		reducer2=clusterer
-    end
-end
-```UnivariateTimeTypeToContinuous
+See the [Composing Models](@ref) section of the MLJ manual.