Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded support for apply_forest_proba (Issue #209) #210

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 22 additions & 11 deletions src/classification/main.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,19 @@ end

# Applies `row_fun(X_row)::AbstractVector` to each row in X
# and returns a matrix containing the resulting vectors, stacked vertically
function stack_function_results(row_fun::Function, X::AbstractMatrix)
function stack_function_results(row_fun::Function, X::AbstractMatrix;
use_multithreading = false)
N = size(X, 1)
N_cols = length(row_fun(X[1, :])) # gets the number of columns
out = Array{Float64}(undef, N, N_cols)
for i in 1:N
out[i, :] = row_fun(X[i, :])
if use_multithreading
for i in 1:N
out[i, :] = row_fun(X[i, :])
end
else
for i in 1:N
out[i, :] = row_fun(X[i, :])
end
Comment on lines +32 to +38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cases are the same?

end
return out
end
Expand Down Expand Up @@ -329,10 +336,10 @@ function apply_tree_proba(
return apply_tree_proba(tree.right, features, labels)
end
end
apply_tree_proba(tree::Root{S, T}, features::AbstractMatrix{S}, labels) where {S, T} =
apply_tree_proba(tree.node, features, labels)
apply_tree_proba(tree::LeafOrNode{S, T}, features::AbstractMatrix{S}, labels) where {S, T} =
stack_function_results(row->apply_tree_proba(tree, row, labels), features)
apply_tree_proba(tree::Root{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} =
apply_tree_proba(tree.node, features, labels, use_multithreading = use_multithreading)
apply_tree_proba(tree::LeafOrNode{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} =
stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading)
stack_function_results(row->apply_tree_proba(tree, row, labels), features; use_multithreading)

Lower bound is set to 1.6 so no need to repeat the keyword name


function build_forest(
labels :: AbstractVector{T},
Expand Down Expand Up @@ -488,10 +495,12 @@ end
apply_forest_proba(
forest::Ensemble{S, T},
features::AbstractMatrix{S},
labels
labels;
use_multithreading = false
) where {S, T} =
stack_function_results(row->apply_forest_proba(forest, row, labels),
features)
features,
use_multithreading = use_multithreading)

function build_adaboost_stumps(
labels :: AbstractVector{T},
Expand Down Expand Up @@ -597,10 +606,12 @@ function apply_adaboost_stumps_proba(
stumps::Ensemble{S, T},
coeffs::AbstractVector{Float64},
features::AbstractMatrix{S},
labels::AbstractVector{T}
labels::AbstractVector{T};
use_multithreading = false
) where {S, T}
stack_function_results(
row->apply_adaboost_stumps_proba(stumps, coeffs, row, labels),
features
features,
use_multithreading = use_multithreading
)
end
6 changes: 6 additions & 0 deletions test/classification/iris.jl
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ cm = confusion_matrix(labels, preds)
@test depth(model) == 1
probs = apply_tree_proba(model, features, classes)
@test reshape(sum(probs, dims=2), n) ≈ ones(n)
probs_m = apply_tree_proba(model, features, classes, use_multithreading=true)
Copy link
Member

@rikhuijzer rikhuijzer Jan 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although there isn't a format style guide for this repository, consistent use of spaces around keyword argument equal signs seems like a good start. Here at line 19 are no spaces and at 33 and 59 there are. MLJ style is no spaces around keyword arguments equals I think.

Same holds for using the semicolon to separate the arguments from the keyword arguments. At some places in this PR it is done and and some not. Here, it's generally advised to use semicolons because they improve clarity.

@test reshape(sum(probs_m, dims=2), n) ≈ ones(n)

# train full-tree classifier (over-fit)
model = build_tree(labels, features)
Expand All @@ -28,6 +30,8 @@ cm = confusion_matrix(labels, preds)
print_tree(model)
probs = apply_tree_proba(model, features, classes)
@test reshape(sum(probs, dims=2), n) ≈ ones(n)
probs_m = apply_tree_proba(model, features, classes, use_multithreading = true)
@test reshape(sum(probs_m, dims=2), n) ≈ ones(n)
i1 = impurity_importance(model)
s1 = split_importance(model)

Expand All @@ -52,6 +56,8 @@ cm = confusion_matrix(labels, preds)
@test 0.95 < cm.accuracy < 1.0
probs = apply_tree_proba(model, features, classes)
@test reshape(sum(probs, dims=2), n) ≈ ones(n)
probs_m = apply_tree_proba(model, features, classes, use_multithreading = true)
@test reshape(sum(probs_m, dims=2), n) ≈ ones(n)

# prune tree to a stump, 2 leaves
pruning_purity = 0.5
Expand Down