Corner case: Training a classifier with only one class present #22

ablaom · 2023-02-15T23:16:53Z

Rather than complaining when only one class is seen, wouldn't a more robust approach be to predict the single class seen with probability one? At least, in the MLJ interface, where we track the complete target pool, this shouldn't pose any issues.

One hits this corner case when subsampling binary data with a small training set, unbalanced data, and/or lots of folds.

See also #20 (comment).

tylerjthomas9 · 2024-02-01T17:54:26Z

Closed by #29

ablaom · 2024-12-16T01:05:14Z

It turns out this is not resolved. Here's a MWE:

import CatBoost
using MLJBase

Xtrain = MLJBase.table(rand(3, 2))
Xtest = MLJBase.table(rand(2, 2))
y = categorical(["yes", "yes", "yes", "no", "no"])
ytrain = y[1:3]


model = CatBoost.MLJCatBoostInterface.CatBoostClassifier()
mach = machine(model, Xtrain, ytrain) |> MLJBase.fit!
MLJBase.predict(mach, Xtest)

# ERROR: ArgumentError: CategoricalArray only supports AbstractString, AbstractChar and Number element types (got element type CategoricalArrays.CategoricalValue{String, UInt32})
# Stacktrace:
#  [1] check_supported_eltype(::Type{CategoricalArrays.CategoricalValue{…}}, ::Type{CategoricalArrays.CategoricalValue{…}})                                    
#    @ CategoricalArrays ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:29
#  [2] fixtype(A::Vector{CategoricalArrays.CategoricalValue{String, UInt32}})
#    @ CategoricalArrays ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:45
#  [3] #categorical#65
#    @ ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:1028 [inlined]
#  [4] categorical
#    @ ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:1022 [inlined]
#  [5] categorical(::MLJModelInterface.FullInterface, a::Vector{CategoricalArrays.CategoricalValue{…}}; kw::@Kwargs{})                         
#    @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/interface/data_utils.jl:5
#  [6] categorical(a::Vector{CategoricalArrays.CategoricalValue{String, UInt32}}; 
# kw::@Kwargs{})                                                                 
#    @ MLJModelInterface ~/.julia/packages/MLJModelInterface/y9x5A/src/data_utils.jl:22                                                                
#  [7] predict(mlj_model::CatBoost.MLJCatBoostInterface.CatBoostClassifier, fitresult::@NamedTuple{…}, X_pool::PythonCall.Core.Py)                          
#    @ CatBoost.MLJCatBoostInterface ~/.julia/packages/CatBoost/8tf8r/src/mlj_catboostclassifier.jl:106                                                   
#  [8] predict(mach::Machine{…}, Xraw::Tables.MatrixTable{…})
#    @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/operations.jl:135
#  [9] top-level scope
#    @ REPL[152]:1
# Some type information was truncated. Use `show(err)` to see complete types.

tylerjthomas9 · 2024-12-25T19:32:53Z

This was happening because the y_first value being passed to MMI.fit is sometimes an int/string/..., and sometimes it is a CategoricalValue. In your example above, it is a CategoricalValue. I added a quick commit to #42 that ensures y_first is in the CategoricalValue format. Let me know what you think

This works in the current version, but y is not the correct type. Both versions should work in the latest commit.

import CatBoost
using MLJBase

Xtrain = MLJBase.table(rand(3, 2))
Xtest = MLJBase.table(rand(2, 2))
y = ["yes", "yes", "yes", "no", "no"]
ytrain = y[1:3]


model = CatBoost.MLJCatBoostInterface.CatBoostClassifier()
mach = machine(model, Xtrain, ytrain) |> MLJBase.fit!
MLJBase.predict(mach, Xtest)

This was referenced Feb 15, 2023

Add CatBoostClassifier/CatBoostRegressor Docs & Fix badge links #20

Closed

Fix classifiers that cannot handle only one single target class JuliaAI/MLJScikitLearnInterface.jl#48

Closed

tylerjthomas9 mentioned this issue Jan 8, 2024

Add support for training classifiers with one class present #29

Merged

tylerjthomas9 closed this as completed Feb 1, 2024

ablaom mentioned this issue Feb 1, 2024

Reinstate CatBoost integraton test JuliaAI/MLJ.jl#1092

Open

1 task

ablaom reopened this Dec 16, 2024

ablaom mentioned this issue Dec 16, 2024

Bad update implementation is not being caught JuliaAI/MLJTestIntegration.jl#46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corner case: Training a classifier with only one class present #22

Corner case: Training a classifier with only one class present #22

ablaom commented Feb 15, 2023

tylerjthomas9 commented Feb 1, 2024

ablaom commented Dec 16, 2024

tylerjthomas9 commented Dec 25, 2024

Corner case: Training a classifier with only one class present #22

Corner case: Training a classifier with only one class present #22

Comments

ablaom commented Feb 15, 2023

tylerjthomas9 commented Feb 1, 2024

ablaom commented Dec 16, 2024

tylerjthomas9 commented Dec 25, 2024