Add __sklearn_tags__ method to all transformers to be compatible with sklearn pipeline #831

Morgan-Sell · 2025-01-06T14:04:28Z

Describe the bug
In scikit-learn v1.7, all transformers/estimators, used in a pipeline, must have a __sklearn_tags__ method.

__sklearn_tags__ is used to provide metadata about the estimator, such as whether it supports multi-output, requires fitted parameters, etc.

For more information, go to this page of the scikit-learndocs and search for "sklearn_tags". It's located under the "Estimator Tags" section.

To Reproduce
Use any feature-engine transformer in a sklean pipeline.

Expected behavior
Proper execution of sklearn pipeline that uses feature-engine transformers.

Screenshots
Here's the warning that's raised:

 FutureWarning: The CategoricalImputer or classes from which it inherits use `_get_tags` and `_more_tags`. Please define the `__sklearn_tags__` method, or inherit from `sklearn.base.BaseEstimator` and/or other appropriate mixins such as `sklearn.base.TransformerMixin`, `sklearn.base.ClassifierMixin`, `sklearn.base.RegressorMixin`, and `sklearn.base.OutlierMixin`. From scikit-learn 1.7, not defining `__sklearn_tags__` will raise an error.

Desktop (please complete the following information):

OS: Mac
Browser: Chrome
Version: Latest feature-engine version

Additional context
N/A

The text was updated successfully, but these errors were encountered:

ClaudioSalvatoreArcidiacono · 2025-01-06T15:14:22Z

Just met this issue as well. A temporary fix that solved it for me was to downgrade sklearn version to 1.5.2

pip install "scikit-learn==1.5.2"

Morgan-Sell · 2025-01-06T16:22:20Z

sklearn v1.6 also works. It seems like __sklearn_tags__ was introduced in that version, but didn't make the method a requirement.

…

On Mon, Jan 6, 2025 at 10:14 AM Claudio Salvatore Arcidiacono < ***@***.***> wrote: Just met this issue as well. A temporary fix that solved it for me was to downgrade sklearn version to 1.5.2 pip install "scikit-learn==1.5.2" — Reply to this email directly, view it on GitHub <#831 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLGFSQGFPC45PNG72QUHP32JKMWLAVCNFSM6AAAAABUVVPGJOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZTGMYTQOJQGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ClaudioSalvatoreArcidiacono · 2025-01-07T08:27:05Z

sklearn v1.6 also works. It seems like sklearn_tags was introduced in
that version, but didn't make the method a requirement.

Mh... I am not sure about it.

The following code does not work for me with feature_engine==1.8.2 and scikit-learn==1.6.0, but it does when I downgrade scikit-learn to 1.5.2.

I have not checked with other feature-engine transformers. Perhaps CategoricalImputer still works with scikit-learn v1.6?

from feature_engine.encoding import WoEEncoder
import numpy as np
import pandas as pd

X = np.array([["dog"] * 20 + ["cat"] * 30 + ["snake"] * 38], dtype=object).T
y = [0] * 15 + [1] * 5 + [0] * 15 + [1] * 15 + [0] * 20 + [1] * 18

X = pd.DataFrame({"col": X[:, 0]})
y = pd.Series(y, name="y")
X = X.sample(frac=1, random_state=42).reset_index(drop=True)

enc = WoEEncoder()
enc.fit(X, y).transform(X)

apavlo89 · 2025-01-13T13:03:04Z

downgrading doesn't fix it for some reason. Please fixxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

solegalli · 2025-01-15T13:01:45Z

Hi all, thanks for raising the issue.

It is the typical end of year scikit-learn release, with breaking changes that inevitably breaks feature-engine and gives me a lot of headache.

I am on it!

ClaudioSalvatoreArcidiacono · 2025-01-15T14:24:05Z

hey @solegalli I was looking into the issue as well, let me know if you need some help.

From what I have understood the changes needed to support this new version are pretty major, in essence:

In order to access super().__sklearn_tags__() you might need to change the inheritance order of some base classes, for example:

class BaseSelector(BaseEstimator, TransformerMixin, GetFeatureNamesOutMixin):

should become

# notice how Mixins classes should be on the left of BaseEstimator
class BaseSelector(TransformerMixin, GetFeatureNamesOutMixin, BaseEstimator):

Pretty much all classes (not only the base classes) from feature_engine should implement the method __sklearn_tags__ alongside the _more_tags method (kept for backward compatibility). The following could be a generic implementation of __sklearn_tags__ that should fit every class:

    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        for key, value in self._more_tags().items():
            if hasattr(tags, key):
                setattr(tags, key, value)
        return tags

All of the input parameters checks that raise an exception in the __init__ and set_params should be moved to the fit method. Change introduced by this new estimator check, documented in here.

It might be interesting to see how other similar libraries like scikit-lego solved the issue (koaning/scikit-lego#726)

solegalli · 2025-01-17T12:17:19Z

Hi @ClaudioSalvatoreArcidiacono

Thank you! That's very useful. I already started changing the inheritance order and adding the sklearn tags to the classes.in #833

Our transformers fail many of the tests because of the design of feature-engine. So, the main task is to pass the tests correctly to the check_estimator and parametrize_with_tests.

sklearn does not permit checks in the init, but we do. I figured that it's more intuitive to fail directly when setting up the class than after fit. Not sure that was the correct decision, but we've made that decision, and for now, sticking to it is the simplest. So that test, is known to fail.

I've updated up to the encoding module. If you want to add to the PR by updating some of the remaining, please do. I am going from top to bottom, if you give it a start, maybe start from bottom to top, so we don't duplicate work.

Cheers!

solegalli · 2025-01-17T12:19:18Z

I forgot to mention that some of the tags in more_tags are for our own tests, so we can't implement this function:

    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        for key, value in self._more_tags().items():
            if hasattr(tags, key):
                setattr(tags, key, value)
        return tags

because some tags are not relevant for sklearn.

ClaudioSalvatoreArcidiacono · 2025-01-20T12:23:02Z

I am aware of that, that's why I have added the statement:

if hasattr(tags, key):

In case the tag is not an attribute of the tags object it will not be assigned. This should cover the feature-engine specific tags.

solegalli mentioned this issue Jan 15, 2025

add sklearn_tags #833

Merged

5 tasks

This was referenced Jan 20, 2025

make time_series module compatible #834

Merged

make wrappers module compatible #835

Merged

make transformation module compatible with sklearn_tags #836

Merged

solegalli closed this as completed in #833 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add __sklearn_tags__ method to all transformers to be compatible with sklearn pipeline #831

Add __sklearn_tags__ method to all transformers to be compatible with sklearn pipeline #831

Morgan-Sell commented Jan 6, 2025

ClaudioSalvatoreArcidiacono commented Jan 6, 2025

Morgan-Sell commented Jan 6, 2025 via email

ClaudioSalvatoreArcidiacono commented Jan 7, 2025

apavlo89 commented Jan 13, 2025

solegalli commented Jan 15, 2025

ClaudioSalvatoreArcidiacono commented Jan 15, 2025 •

edited

Loading

solegalli commented Jan 17, 2025

solegalli commented Jan 17, 2025

ClaudioSalvatoreArcidiacono commented Jan 20, 2025

Add __sklearn_tags__ method to all transformers to be compatible with sklearn pipeline #831

Add __sklearn_tags__ method to all transformers to be compatible with sklearn pipeline #831

Comments

Morgan-Sell commented Jan 6, 2025

ClaudioSalvatoreArcidiacono commented Jan 6, 2025

Morgan-Sell commented Jan 6, 2025 via email

ClaudioSalvatoreArcidiacono commented Jan 7, 2025

apavlo89 commented Jan 13, 2025

solegalli commented Jan 15, 2025

ClaudioSalvatoreArcidiacono commented Jan 15, 2025 • edited Loading

solegalli commented Jan 17, 2025

solegalli commented Jan 17, 2025

ClaudioSalvatoreArcidiacono commented Jan 20, 2025

ClaudioSalvatoreArcidiacono commented Jan 15, 2025 •

edited

Loading