Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CatBoostClassifier/CatBoostRegressor Docs & Fix badge links #20

Closed
wants to merge 5 commits into from
Closed

Add CatBoostClassifier/CatBoostRegressor Docs & Fix badge links #20

wants to merge 5 commits into from

Conversation

tylerjthomas9
Copy link
Collaborator

I added documentation for the python CatBoostClassifier/CatBoostRegressor models. Additionally, the beacon-biosignals links were updated with JuliaAI, so hopefully, all the badges will work again.

#19

@tylerjthomas9
Copy link
Collaborator Author

tylerjthomas9 commented Feb 13, 2023

@ablaom The MLJTestInterface tests are failing because the data is being split down the middle, so only one label is being passed to the CatBoostClassifier in the binary case (using the MLJTestInterface.make_binary() data). This seems to be a new addition to MLJTestInterface. Should I shuffle the data on the catboost side to avoid this, or is this a MLJTestInterface change?

JuliaAI/MLJTestInterface.jl@702c7b9

@ablaom
Copy link
Member

ablaom commented Feb 14, 2023

Ah. Thanks for reporting this. Ideally, binary classifiers should be able to handle a level not manifest in the training target (but present in the pool) but I understand this can be a pain to fix. One could predict the seen class every time, with probabability one, if probabilistic (and skip the call to fit the core model). Although this ordinarily isn't a problem on larger datasets, I would encourage you to post an issue to have this remedied at some point. In the meantime one could either mark the test as broken (probably better practice) or hack a fix as you suggest.

Some other models failed in this way and we fixed them a little while back. CatBoost slipped through on the old version of the tests (not sure why) but the even stricter tests just released are catching the issue. By the way, I tested that the stricter tests are okay for all registered models, so am inclined to keep the new stricter tests, as I still believe they are correct.

@codecov-commenter
Copy link

Codecov Report

Merging #20 (4159f20) into main (bde0968) will increase coverage by 3.95%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main      #20      +/-   ##
==========================================
+ Coverage   73.61%   77.57%   +3.95%     
==========================================
  Files           6        6              
  Lines         163      165       +2     
==========================================
+ Hits          120      128       +8     
+ Misses         43       37       -6     
Impacted Files Coverage Δ
src/MLJCatBoostInterface.jl 79.66% <ø> (+10.16%) ⬆️
src/CatBoost.jl 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@tylerjthomas9
Copy link
Collaborator Author

tylerjthomas9 commented Feb 14, 2023

I just added a temporary fix to ensure the tests work across the platforms. Setting the test to broken didn't fully fix it, so I assume the Python Exception is breaking the model fit, and not being caught. Here is the test table when setting the test to broken.

Test Summary:             | Pass  Error  Broken  Total   Time
MLJ Interface             |    1      1       1      3  53.6s
  MLJ Examples            |                       None   9.8s
  CatBoostClassifier      |                       None   0.4s
  CatBoostRegressor       |                       None   0.1s
  generic interface tests |    1      1       1      3  43.3s
    CatBoostRegressor     |    1                     1  31.5s
    CatBoostClassifier    |           1       1      2  11.8s

@ablaom
Copy link
Member

ablaom commented Feb 15, 2023

I just added a temporary fix to ensure the tests work across the platforms.

Perhaps @ericphanson will want to comment / review. I've looked over the PR and it looks otherwise good, thanks.

In any case, I suggest posting an issue to revert the test when #22 is addressed, or you can add a comment there.

Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go.

@ablaom
Copy link
Member

ablaom commented Apr 5, 2023

I think a real maintainer should merge this. @ericphanson or @tylerjthomas9 ?

@tylerjthomas9
Copy link
Collaborator Author

I don't have merge permissions. However, this still looks good to me. I am struggling to think of a good way to implement the single class case for the classifier (#22), but that can be tackled later.

@ericphanson
Copy link
Collaborator

Sounds like this was included in #23. Thanks both!

@ericphanson ericphanson closed this Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants