Skip to content

Commit

Permalink
Merge pull request #1107 from abhro/patch-1
Browse files Browse the repository at this point in the history
Use repl language tag for sample
  • Loading branch information
ablaom authored May 19, 2024
2 parents bd08451 + 650ebbd commit 2745563
Show file tree
Hide file tree
Showing 25 changed files with 309 additions and 316 deletions.
24 changes: 11 additions & 13 deletions docs/src/about_mlj.md
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# About MLJ

MLJ (Machine Learning in Julia) is a toolbox written in Julia
MLJ (Machine Learning in Julia) is a toolbox written in Julia
providing a common interface and meta-algorithms for selecting,
tuning, evaluating, composing and comparing [over 180 machine learning
models](@ref model_list) written in Julia and other languages. In
Expand All @@ -22,8 +22,7 @@ The first code snippet below creates a new Julia environment
[Installation](@ref) for more on creating a Julia environment for use
with MLJ.

Julia installation instructions are
[here](https://julialang.org/downloads/).
Julia installation instructions are [here](https://julialang.org/downloads/).

```julia
using Pkg
Expand All @@ -44,7 +43,7 @@ Loading and instantiating a gradient tree-boosting model:
using MLJ
Booster = @load EvoTreeRegressor # loads code defining a model type
booster = Booster(max_depth=2) # specify hyper-parameter at construction
booster.nrounds=50 # or mutate afterwards
booster.nrounds = 50 # or mutate afterwards
```

This model is an example of an iterative model. As it stands, the
Expand Down Expand Up @@ -92,7 +91,7 @@ it "self-tuning":
```julia
self_tuning_pipe = TunedModel(model=pipe,
tuning=RandomSearch(),
ranges = max_depth_range,
ranges=max_depth_range,
resampling=CV(nfolds=3, rng=456),
measure=l1,
acceleration=CPUThreads(),
Expand All @@ -105,12 +104,12 @@ Loading a selection of features and labels from the Ames
House Price dataset:

```julia
X, y = @load_reduced_ames;
X, y = @load_reduced_ames
```
Evaluating the "self-tuning" pipeline model's performance using 5-fold
cross-validation (implies multiple layers of nested resampling):

```julia
```julia-repl
julia> evaluate(self_tuning_pipe, X, y,
measures=[l1, l2],
resampling=CV(nfolds=5, rng=123),
Expand Down Expand Up @@ -155,8 +154,7 @@ Extract:

* Consistent interface to handle probabilistic predictions.

* Extensible [tuning
interface](https://github.com/JuliaAI/MLJTuning.jl),
* Extensible [tuning interface](https://github.com/JuliaAI/MLJTuning.jl),
to support a growing number of optimization strategies, and designed
to play well with model composition.

Expand Down Expand Up @@ -229,19 +227,19 @@ installed in a new
[environment](https://julialang.github.io/Pkg.jl/v1/environments/) to
avoid package conflicts. You can do this with

```julia
```julia-repl
julia> using Pkg; Pkg.activate("my_MLJ_env", shared=true)
```

Installing MLJ is also done with the package manager:

```julia
```julia-repl
julia> Pkg.add("MLJ")
```

**Optional:** To test your installation, run

```julia
```julia-repl
julia> Pkg.test("MLJ")
```

Expand All @@ -252,7 +250,7 @@ environment to make model-specific code available. This
happens automatically when you use MLJ's interactive load command
`@iload`, as in

```julia
```julia-repl
julia> Tree = @iload DecisionTreeClassifier # load type
julia> tree = Tree() # instance
```
Expand Down
2 changes: 1 addition & 1 deletion docs/src/adding_models_for_general_use.md
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ suitable for addition to the MLJ Model Registry, consult the [MLJModelInterface.
documentation](https://juliaai.github.io/MLJModelInterface.jl/dev/).

For quick-and-dirty user-defined models see [Simple User Defined
Models](simple_user_defined_models.md).
Models](simple_user_defined_models.md).
Empty file modified docs/src/api.md
100755 → 100644
Empty file.
57 changes: 27 additions & 30 deletions docs/src/common_mlj_workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,31 +23,27 @@ MLJ_VERSION
## Data ingestion

```@setup workflows
# to avoid RDatasets as a doc dependency:
# to avoid RDatasets as a doc dependency, generate synthetic data with
# similar parameters, with the first four rows mimicking the original dataset
# for display purposes
color_off()
import DataFrames
channing = (Sex = rand(["Male","Female"], 462),
Entry = rand(Int, 462),
Exit = rand(Int, 462),
Time = rand(Int, 462),
Cens = rand(Int, 462)) |> DataFrames.DataFrame
channing = (Sex = [repeat(["Male"], 4)..., rand(["Male","Female"], 458)...],
Entry = Int32[782, 1020, 856, 915, rand(733:1140, 458)...],
Exit = Int32[909, 1128, 969, 957, rand(777:1207, 458)...],
Time = Int32[127, 108, 113, 42, rand(0:137, 458)...],
Cens = Int32[1, 1, 1, 1, rand(0:1, 458)...]) |> DataFrames.DataFrame
coerce!(channing, :Sex => Multiclass)
```


```julia
import RDatasets
channing = RDatasets.dataset("boot", "channing")
```

julia> first(channing, 4)
4×5 DataFrame
Row │ Sex Entry Exit Time Cens
│ Cat Int32 Int32 Int32 Int32
─────┼──────────────────────────────────
1 │ Male 782 909 127 1
2 │ Male 1020 1128 108 1
3 │ Male 856 969 113 1
4 │ Male 915 957 42 1
```@example workflows
first(channing, 4) |> pretty
```

Inspecting metadata, including column scientific types:
Expand All @@ -61,17 +57,17 @@ Horizontally splitting data and shuffling rows.
Here `y` is the `:Exit` column and `X` a table with everything else:

```@example workflows
y, X = unpack(channing, ==(:Exit), rng=123);
y, X = unpack(channing, ==(:Exit), rng=123)
nothing # hide
```

Here `y` is the `:Exit` column and `X` everything else except `:Time`:

```@example workflows
y, X = unpack(channing,
==(:Exit),
!=(:Time);
rng=123);
y, X = unpack(channing,
==(:Exit),
!=(:Time);
rng=123);
scitype(y)
```

Expand Down Expand Up @@ -115,7 +111,7 @@ nothing # hide
Or, if already horizontally split:

```@example workflows
(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.6, multi=true, rng=123)
(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.6, multi=true, rng=123)
```


Expand Down Expand Up @@ -171,7 +167,7 @@ nothing # hide

## Instantiating a model

*Reference:* [Getting Started](@ref), [Loading Model Code](@ref)
*Reference:* [Getting Started](@ref), [Loading Model Code](@ref)

Assumes `MLJDecisionTreeClassifier` is in your environment. Otherwise, try interactive
loading with `@iload`:
Expand All @@ -183,7 +179,7 @@ tree = Tree(min_samples_split=5, max_depth=4)

or

```@julia
```julia
tree = (@load DecisionTreeClassifier)()
tree.min_samples_split = 5
tree.max_depth = 4
Expand All @@ -208,7 +204,7 @@ Do `measures()` to list all losses and scores and their aliases, or refer to the
StatisticalMeasures.jl [docs](https://juliaai.github.io/StatisticalMeasures.jl/dev/).


## Basic fit/evaluate/predict by hand:
## Basic fit/evaluate/predict by hand

*Reference:* [Getting Started](index.md), [Machines](machines.md),
[Evaluating Model Performance](evaluating_model_performance.md), [Performance Measures](performance_measures.md)
Expand Down Expand Up @@ -251,7 +247,7 @@ Note `LogLoss()` has aliases `log_loss` and `cross_entropy`.
Predict on the new data set:

```@example workflows
Xnew = (FL = rand(3), RW = rand(3), CL = rand(3), CW = rand(3), BD =rand(3))
Xnew = (FL = rand(3), RW = rand(3), CL = rand(3), CW = rand(3), BD = rand(3))
predict(mach, Xnew) # a vector of distributions
```

Expand Down Expand Up @@ -379,8 +375,8 @@ z = transform(mach, y);

*Reference:* [Tuning Models](tuning_models.md)

```@example workflows
X, y = @load_iris; nothing # hide
```@setup workflows
X, y = @load_iris
```

Define a model with nested hyperparameters:
Expand Down Expand Up @@ -502,7 +498,7 @@ Tree = @load DecisionTreeRegressor pkg=DecisionTree verbosity=0
tree_with_target = TransformedTargetModel(model=Tree(),
transformer=y -> log.(y),
inverse = z -> exp.(z))
pipe2 = (X -> coerce(X, :age=>Continuous)) |> OneHotEncoder() |> tree_with_target;
pipe2 = (X -> coerce(X, :age=>Continuous)) |> OneHotEncoder() |> tree_with_target
nothing # hide
```

Expand Down Expand Up @@ -538,7 +534,8 @@ curve = learning_curve(mach,

```julia
using Plots
plot(curve.parameter_values, curve.measurements, xlab=curve.parameter_name, xscale=curve.parameter_scale)
plot(curve.parameter_values, curve.measurements,
xlab=curve.parameter_name, xscale=curve.parameter_scale)
```

![](img/workflows_learning_curve.png)
Expand All @@ -558,7 +555,7 @@ curve = learning_curve(mach,

```julia
plot(curve.parameter_values, curve.measurements,
xlab=curve.parameter_name, xscale=curve.parameter_scale)
xlab=curve.parameter_name, xscale=curve.parameter_scale)
```

![](img/workflows_learning_curves.png)
17 changes: 8 additions & 9 deletions docs/src/controlling_iterative_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ control | description
[`TimeLimit`](@ref EarlyStopping.TimeLimit)`(t=0.5)` | Stop after `t` hours | yes
[`NumberLimit`](@ref EarlyStopping.NumberLimit)`(n=100)` | Stop after `n` applications of the control | yes
[`NumberSinceBest`](@ref EarlyStopping.NumberSinceBest)`(n=6)` | Stop when best loss occurred `n` control applications ago | yes
[`InvalidValue`](@ref IterationControl.InvalidValue)() | Stop when `NaN`, `Inf` or `-Inf` loss/training loss encountered | yes
[`InvalidValue`](@ref IterationControl.InvalidValue)() | Stop when `NaN`, `Inf` or `-Inf` loss/training loss encountered | yes
[`Threshold`](@ref EarlyStopping.Threshold)`(value=0.0)` | Stop when `loss < value` | yes
[`GL`](@ref EarlyStopping.GL)`(alpha=2.0)` | † Stop after the "generalization loss (GL)" exceeds `alpha` | yes
[`PQ`](@ref EarlyStopping.PQ)`(alpha=0.75, k=5)` | † Stop after "progress-modified GL" exceeds `alpha` | yes
Expand All @@ -109,15 +109,15 @@ control | description
[`Error`](@ref IterationControl.Error)`(predicate; f="")` | Log to `Error` the value of `f` or `f(mach)`, if `predicate(mach)` holds and then stop | yes
[`Callback`](@ref IterationControl.Callback)`(f=mach->nothing)`| Call `f(mach)` | yes
[`WithNumberDo`](@ref IterationControl.WithNumberDo)`(f=n->@info(n))` | Call `f(n + 1)` where `n` is the number of complete control cycles so far | yes
[`WithIterationsDo`](@ref MLJIteration.WithIterationsDo)`(f=i->@info("iterations: $i"))`| Call `f(i)`, where `i` is total number of iterations | yes
[`WithIterationsDo`](@ref MLJIteration.WithIterationsDo)`(f=i->@info("iterations: $i"))` | Call `f(i)`, where `i` is total number of iterations | yes
[`WithLossDo`](@ref IterationControl.WithLossDo)`(f=x->@info("loss: $x"))` | Call `f(loss)` where `loss` is the current loss | yes
[`WithTrainingLossesDo`](@ref IterationControl.WithTrainingLossesDo)`(f=v->@info(v))` | Call `f(v)` where `v` is the current batch of training losses | yes
[`WithEvaluationDo`](@ref MLJIteration.WithEvaluationDo)`(f->e->@info("evaluation: $e))`| Call `f(e)` where `e` is the current performance evaluation object | yes
[`WithTrainingLossesDo`](@ref IterationControl.WithTrainingLossesDo)`(f=v->@info(v))` | Call `f(v)` where `v` is the current batch of training losses | yes
[`WithEvaluationDo`](@ref MLJIteration.WithEvaluationDo)`(f->e->@info("evaluation: $e))` | Call `f(e)` where `e` is the current performance evaluation object | yes
[`WithFittedParamsDo`](@ref MLJIteration.WithFittedParamsDo)`(f->fp->@info("fitted_params: $fp))`| Call `f(fp)` where `fp` is fitted parameters of training machine | yes
[`WithReportDo`](@ref MLJIteration.WithReportDo)`(f->e->@info("report: $e))`| Call `f(r)` where `r` is the training machine report | yes
[`WithModelDo`](@ref MLJIteration.WithModelDo)`(f->m->@info("model: $m))`| Call `f(m)` where `m` is the model, which may be mutated by `f` | yes
[`WithMachineDo`](@ref MLJIteration.WithMachineDo)`(f->mach->@info("report: $mach))`| Call `f(mach)` wher `mach` is the training machine in its current state | yes
[`Save`](@ref MLJIteration.Save)`(filename="machine.jls")`|Save current training machine to `machine1.jls`, `machine2.jsl`, etc | yes
[`WithReportDo`](@ref MLJIteration.WithReportDo)`(f->e->@info("report: $e))`| Call `f(r)` where `r` is the training machine report | yes
[`WithModelDo`](@ref MLJIteration.WithModelDo)`(f->m->@info("model: $m))`| Call `f(m)` where `m` is the model, which may be mutated by `f` | yes
[`WithMachineDo`](@ref MLJIteration.WithMachineDo)`(f->mach->@info("report: $mach))`| Call `f(mach)` wher `mach` is the training machine in its current state | yes
[`Save`](@ref MLJIteration.Save)`(filename="machine.jls")` | Save current training machine to `machine1.jls`, `machine2.jsl`, etc | yes

> Table 1. Atomic controls. Some advanced options are omitted.
Expand Down Expand Up @@ -253,7 +253,6 @@ In the code, `wrapper` is an object that wraps the training machine
in this example).

```julia

import IterationControl # or MLJ.IterationControl

struct IterateFromList
Expand Down
15 changes: 7 additions & 8 deletions docs/src/evaluating_model_performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ using MLJ
X = (a=rand(12), b=rand(12), c=rand(12));
y = X.a + 2X.b + 0.05*rand(12);
model = (@load RidgeRegressor pkg=MultivariateStats verbosity=0)()
cv=CV(nfolds=3)
cv = CV(nfolds=3)
evaluate(model, X, y, resampling=cv, measure=l2, verbosity=0)
```

Expand All @@ -51,8 +51,8 @@ Multiple measures are specified as a vector:
evaluate!(
mach,
resampling=cv,
measures=[l1, rms, rmslp1],
verbosity=0,
measures=[l1, rms, rmslp1],
verbosity=0,
)
```

Expand All @@ -70,7 +70,7 @@ evaluate!(
mach,
resampling=CV(nfolds=3),
measure=[l2, rsquared],
weights=weights,
weights=weights,
)
```

Expand All @@ -91,12 +91,12 @@ fold1 = 1:6; fold2 = 7:12;
evaluate!(
mach,
resampling = [(fold1, fold2), (fold2, fold1)],
measures=[l1, l2],
verbosity=0,
measures=[l1, l2],
verbosity=0,
)
```

Or the user can define their own re-usable `ResamplingStrategy` objects, - see [Custom
Or the user can define their own re-usable `ResamplingStrategy` objects; see [Custom
resampling strategies](@ref) below.


Expand Down Expand Up @@ -170,4 +170,3 @@ function train_test_pairs(holdout::Holdout, rows)
return [(train, test),]
end
```

Empty file modified docs/src/frequently_asked_questions.md
100755 → 100644
Empty file.
12 changes: 6 additions & 6 deletions docs/src/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ For an outline of MLJ's **goals** and **features**, see

This page introduces some MLJ basics, assuming some familiarity with
machine learning. For a complete list of other MLJ learning resources,
see [Learning MLJ](@ref).
see [Learning MLJ](@ref).

MLJ collects together the functionality provided by mutliple packages. To learn how to
install components separately, run `using MLJ; @doc MLJ`.

This section introduces only the most basic MLJ operations and
concepts. It assumes MLJ has been successfully installed. See
[Installation](@ref) if this is not the case.
[Installation](@ref) if this is not the case.


```@setup doda
Expand All @@ -31,7 +31,7 @@ column vectors:
```@repl doda
using MLJ
iris = load_iris();
selectrows(iris, 1:3) |> pretty
selectrows(iris, 1:3) |> pretty
schema(iris)
```

Expand Down Expand Up @@ -114,8 +114,8 @@ computing the mode of each prediction):
```@repl doda
evaluate(tree, X, y,
resampling=CV(shuffle=true),
measures=[log_loss, accuracy],
verbosity=0)
measures=[log_loss, accuracy],
verbosity=0)
```

Under the hood, `evaluate` calls lower level functions `predict` or
Expand Down Expand Up @@ -260,7 +260,7 @@ evaluate!(mach, resampling=Holdout(fraction_train=0.7),
Changing a hyperparameter and re-evaluating:

```@repl doda
tree.max_depth = 3
tree.max_depth = 3;
evaluate!(mach, resampling=Holdout(fraction_train=0.7),
measures=[log_loss, accuracy],
verbosity=0)
Expand Down
Loading

0 comments on commit 2745563

Please sign in to comment.