-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corner case: Training a classifier with only one class present #22
Comments
Closed by #29 |
It turns out this is not resolved. Here's a MWE: import CatBoost
using MLJBase
Xtrain = MLJBase.table(rand(3, 2))
Xtest = MLJBase.table(rand(2, 2))
y = categorical(["yes", "yes", "yes", "no", "no"])
ytrain = y[1:3]
model = CatBoost.MLJCatBoostInterface.CatBoostClassifier()
mach = machine(model, Xtrain, ytrain) |> MLJBase.fit!
MLJBase.predict(mach, Xtest)
# ERROR: ArgumentError: CategoricalArray only supports AbstractString, AbstractChar and Number element types (got element type CategoricalArrays.CategoricalValue{String, UInt32})
# Stacktrace:
# [1] check_supported_eltype(::Type{CategoricalArrays.CategoricalValue{…}}, ::Type{CategoricalArrays.CategoricalValue{…}})
# @ CategoricalArrays ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:29
# [2] fixtype(A::Vector{CategoricalArrays.CategoricalValue{String, UInt32}})
# @ CategoricalArrays ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:45
# [3] #categorical#65
# @ ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:1028 [inlined]
# [4] categorical
# @ ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:1022 [inlined]
# [5] categorical(::MLJModelInterface.FullInterface, a::Vector{CategoricalArrays.CategoricalValue{…}}; kw::@Kwargs{})
# @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/interface/data_utils.jl:5
# [6] categorical(a::Vector{CategoricalArrays.CategoricalValue{String, UInt32}};
# kw::@Kwargs{})
# @ MLJModelInterface ~/.julia/packages/MLJModelInterface/y9x5A/src/data_utils.jl:22
# [7] predict(mlj_model::CatBoost.MLJCatBoostInterface.CatBoostClassifier, fitresult::@NamedTuple{…}, X_pool::PythonCall.Core.Py)
# @ CatBoost.MLJCatBoostInterface ~/.julia/packages/CatBoost/8tf8r/src/mlj_catboostclassifier.jl:106
# [8] predict(mach::Machine{…}, Xraw::Tables.MatrixTable{…})
# @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/operations.jl:135
# [9] top-level scope
# @ REPL[152]:1
# Some type information was truncated. Use `show(err)` to see complete types.
|
This was happening because the This works in the current version, but y is not the correct type. Both versions should work in the latest commit. import CatBoost
using MLJBase
Xtrain = MLJBase.table(rand(3, 2))
Xtest = MLJBase.table(rand(2, 2))
y = ["yes", "yes", "yes", "no", "no"]
ytrain = y[1:3]
model = CatBoost.MLJCatBoostInterface.CatBoostClassifier()
mach = machine(model, Xtrain, ytrain) |> MLJBase.fit!
MLJBase.predict(mach, Xtest) |
Rather than complaining when only one class is seen, wouldn't a more robust approach be to predict the single class seen with probability one? At least, in the MLJ interface, where we track the complete target pool, this shouldn't pose any issues.
One hits this corner case when subsampling binary data with a small training set, unbalanced data, and/or lots of folds.
See also #20 (comment).
The text was updated successfully, but these errors were encountered: