-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XgBoost as nuisance model in DML learners (categorical features) #907
Comments
What error are you getting? We should be using the model as-is however you instantiate it. One issue that you might run into is that we are internally doing cross-fitting with the model, so if you have rare values in that column it's possible that when we fit an instance on a subset of the data we don't see one of those values but when we predict on the held-out set we do, leading to an error. If that's the issue, then you can provide your own cross-fitting folds through the |
Thanks for the response. In the example above, I get: So xgboost thinks this feature is numeric when it is categorical when I call NonParamDML (with the xgboost learners as my models). But it all works fine when I train an xgboost model directly. |
This is late follow-up, but I spent some time digging into this and can offer a solution. Hopefully this might help others who run into this. The crux of the problem is two-fold.
This leads to a problem when we try to fit our XGboost model internally within EconML because:
So, your demo code works because we pass pandas dataframes directly to the xgboost's fit method. And it doesn't work in EconML because internally EconML passes numpy arrays to xgboost's fit method, which will give an error if the categorical features are passed as strings. To demonstrate, your xgboost demo code throws an error if instead of passing pandas dataframes you pass the numpy values directly.
However, you can make it work again by
Finally, reapplying this back to your sample code for EconML:
|
@fverac I tried your solution, but it does not work |
Hm, what exactly doesn't work? I just reconfirmed that the sample code I provided still works for me. |
@fverac thanks for your reply. It's my fault. Your categorical variable here is already a numeric type, and then you specify it as a categorical variable later. However, I initially have a categorical variable, and I still get an error when I specify it again. So for my situation, the correct way is to first use Encoder to convert it to a numeric type that econml can handle. |
Hi there,
I'm looking to use xgboost as my nuisance model in my DoubleML setup and use xgboost's own mechanism for encoding categorical features (rather than having to one hot encode them myself).
I can do this quite easily in xgboost using the scikit-learn interface:
However, using the DML classes results in an error. The param that allows us to use xgboost's encoding of categorical features is
enable_categorical=True
but it appears econml is not able to pass this information onto xgboost.Is there a way to get around this? I have some high cardinality categorical features that I would rather not have to one hot encode, hence why I have used xgboost as my nuisance model.
The text was updated successfully, but these errors were encountered: