Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __sklearn_tags__ method to all transformers to be compatible with sklearn pipeline #831

Closed
Morgan-Sell opened this issue Jan 6, 2025 · 9 comments · Fixed by #833
Closed

Comments

@Morgan-Sell
Copy link
Collaborator

Describe the bug
In scikit-learn v1.7, all transformers/estimators, used in a pipeline, must have a __sklearn_tags__ method.

__sklearn_tags__ is used to provide metadata about the estimator, such as whether it supports multi-output, requires fitted parameters, etc.

For more information, go to this page of the scikit-learndocs and search for "sklearn_tags". It's located under the "Estimator Tags" section.

To Reproduce
Use any feature-engine transformer in a sklean pipeline.

Expected behavior
Proper execution of sklearn pipeline that uses feature-engine transformers.

Screenshots
Here's the warning that's raised:

 FutureWarning: The CategoricalImputer or classes from which it inherits use `_get_tags` and `_more_tags`. Please define the `__sklearn_tags__` method, or inherit from `sklearn.base.BaseEstimator` and/or other appropriate mixins such as `sklearn.base.TransformerMixin`, `sklearn.base.ClassifierMixin`, `sklearn.base.RegressorMixin`, and `sklearn.base.OutlierMixin`. From scikit-learn 1.7, not defining `__sklearn_tags__` will raise an error.

Desktop (please complete the following information):

  • OS: Mac
  • Browser: Chrome
  • Version: Latest feature-engine version

Additional context
N/A

@ClaudioSalvatoreArcidiacono
Copy link
Contributor

Just met this issue as well. A temporary fix that solved it for me was to downgrade sklearn version to 1.5.2

pip install "scikit-learn==1.5.2"

@Morgan-Sell
Copy link
Collaborator Author

Morgan-Sell commented Jan 6, 2025 via email

@ClaudioSalvatoreArcidiacono
Copy link
Contributor

sklearn v1.6 also works. It seems like sklearn_tags was introduced in
that version, but didn't make the method a requirement.

Mh... I am not sure about it.

The following code does not work for me with feature_engine==1.8.2 and scikit-learn==1.6.0, but it does when I downgrade scikit-learn to 1.5.2.

I have not checked with other feature-engine transformers. Perhaps CategoricalImputer still works with scikit-learn v1.6?

from feature_engine.encoding import WoEEncoder
import numpy as np
import pandas as pd

X = np.array([["dog"] * 20 + ["cat"] * 30 + ["snake"] * 38], dtype=object).T
y = [0] * 15 + [1] * 5 + [0] * 15 + [1] * 15 + [0] * 20 + [1] * 18

X = pd.DataFrame({"col": X[:, 0]})
y = pd.Series(y, name="y")
X = X.sample(frac=1, random_state=42).reset_index(drop=True)

enc = WoEEncoder()
enc.fit(X, y).transform(X)

@apavlo89
Copy link

downgrading doesn't fix it for some reason. Please fixxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

@solegalli
Copy link
Collaborator

Hi all, thanks for raising the issue.

It is the typical end of year scikit-learn release, with breaking changes that inevitably breaks feature-engine and gives me a lot of headache.

I am on it!

@solegalli solegalli mentioned this issue Jan 15, 2025
5 tasks
@ClaudioSalvatoreArcidiacono
Copy link
Contributor

ClaudioSalvatoreArcidiacono commented Jan 15, 2025

hey @solegalli I was looking into the issue as well, let me know if you need some help.

From what I have understood the changes needed to support this new version are pretty major, in essence:

  1. In order to access super().__sklearn_tags__() you might need to change the inheritance order of some base classes, for example:
    class BaseSelector(BaseEstimator, TransformerMixin, GetFeatureNamesOutMixin):
    should become
    # notice how Mixins classes should be on the left of BaseEstimator
    class BaseSelector(TransformerMixin, GetFeatureNamesOutMixin, BaseEstimator):
  2. Pretty much all classes (not only the base classes) from feature_engine should implement the method __sklearn_tags__ alongside the _more_tags method (kept for backward compatibility). The following could be a generic implementation of __sklearn_tags__ that should fit every class:
        def __sklearn_tags__(self):
            tags = super().__sklearn_tags__()
            for key, value in self._more_tags().items():
                if hasattr(tags, key):
                    setattr(tags, key, value)
            return tags
  3. All of the input parameters checks that raise an exception in the __init__ and set_params should be moved to the fit method. Change introduced by this new estimator check, documented in here.

It might be interesting to see how other similar libraries like scikit-lego solved the issue (koaning/scikit-lego#726)

@solegalli
Copy link
Collaborator

Hi @ClaudioSalvatoreArcidiacono

Thank you! That's very useful. I already started changing the inheritance order and adding the sklearn tags to the classes.in #833

Our transformers fail many of the tests because of the design of feature-engine. So, the main task is to pass the tests correctly to the check_estimator and parametrize_with_tests.

sklearn does not permit checks in the init, but we do. I figured that it's more intuitive to fail directly when setting up the class than after fit. Not sure that was the correct decision, but we've made that decision, and for now, sticking to it is the simplest. So that test, is known to fail.

I've updated up to the encoding module. If you want to add to the PR by updating some of the remaining, please do. I am going from top to bottom, if you give it a start, maybe start from bottom to top, so we don't duplicate work.

Cheers!

@solegalli
Copy link
Collaborator

I forgot to mention that some of the tags in more_tags are for our own tests, so we can't implement this function:

    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        for key, value in self._more_tags().items():
            if hasattr(tags, key):
                setattr(tags, key, value)
        return tags

because some tags are not relevant for sklearn.

@ClaudioSalvatoreArcidiacono
Copy link
Contributor

I am aware of that, that's why I have added the statement:

if hasattr(tags, key):

In case the tag is not an attribute of the tags object it will not be assigned. This should cover the feature-engine specific tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants