Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corner case: Training a classifier with only one class present #22

Open
ablaom opened this issue Feb 15, 2023 · 3 comments
Open

Corner case: Training a classifier with only one class present #22

ablaom opened this issue Feb 15, 2023 · 3 comments

Comments

@ablaom
Copy link
Member

ablaom commented Feb 15, 2023

Rather than complaining when only one class is seen, wouldn't a more robust approach be to predict the single class seen with probability one? At least, in the MLJ interface, where we track the complete target pool, this shouldn't pose any issues.

One hits this corner case when subsampling binary data with a small training set, unbalanced data, and/or lots of folds.

See also #20 (comment).

@tylerjthomas9
Copy link
Collaborator

Closed by #29

@ablaom
Copy link
Member Author

ablaom commented Dec 16, 2024

It turns out this is not resolved. Here's a MWE:

import CatBoost
using MLJBase

Xtrain = MLJBase.table(rand(3, 2))
Xtest = MLJBase.table(rand(2, 2))
y = categorical(["yes", "yes", "yes", "no", "no"])
ytrain = y[1:3]


model = CatBoost.MLJCatBoostInterface.CatBoostClassifier()
mach = machine(model, Xtrain, ytrain) |> MLJBase.fit!
MLJBase.predict(mach, Xtest)

# ERROR: ArgumentError: CategoricalArray only supports AbstractString, AbstractChar and Number element types (got element type CategoricalArrays.CategoricalValue{String, UInt32})
# Stacktrace:
#  [1] check_supported_eltype(::Type{CategoricalArrays.CategoricalValue{…}}, ::Type{CategoricalArrays.CategoricalValue{…}})                                    
#    @ CategoricalArrays ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:29
#  [2] fixtype(A::Vector{CategoricalArrays.CategoricalValue{String, UInt32}})
#    @ CategoricalArrays ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:45
#  [3] #categorical#65
#    @ ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:1028 [inlined]
#  [4] categorical
#    @ ~/.julia/packages/CategoricalArrays/0yLZN/src/array.jl:1022 [inlined]
#  [5] categorical(::MLJModelInterface.FullInterface, a::Vector{CategoricalArrays.CategoricalValue{…}}; kw::@Kwargs{})                         
#    @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/interface/data_utils.jl:5
#  [6] categorical(a::Vector{CategoricalArrays.CategoricalValue{String, UInt32}}; 
# kw::@Kwargs{})                                                                 
#    @ MLJModelInterface ~/.julia/packages/MLJModelInterface/y9x5A/src/data_utils.jl:22                                                                
#  [7] predict(mlj_model::CatBoost.MLJCatBoostInterface.CatBoostClassifier, fitresult::@NamedTuple{…}, X_pool::PythonCall.Core.Py)                          
#    @ CatBoost.MLJCatBoostInterface ~/.julia/packages/CatBoost/8tf8r/src/mlj_catboostclassifier.jl:106                                                   
#  [8] predict(mach::Machine{…}, Xraw::Tables.MatrixTable{…})
#    @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/operations.jl:135
#  [9] top-level scope
#    @ REPL[152]:1
# Some type information was truncated. Use `show(err)` to see complete types.

@tylerjthomas9
Copy link
Collaborator

This was happening because the y_first value being passed to MMI.fit is sometimes an int/string/..., and sometimes it is a CategoricalValue. In your example above, it is a CategoricalValue. I added a quick commit to #42 that ensures y_first is in the CategoricalValue format. Let me know what you think

This works in the current version, but y is not the correct type. Both versions should work in the latest commit.

import CatBoost
using MLJBase

Xtrain = MLJBase.table(rand(3, 2))
Xtest = MLJBase.table(rand(2, 2))
y = ["yes", "yes", "yes", "no", "no"]
ytrain = y[1:3]


model = CatBoost.MLJCatBoostInterface.CatBoostClassifier()
mach = machine(model, Xtrain, ytrain) |> MLJBase.fit!
MLJBase.predict(mach, Xtest)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants