Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update - xgboost to handle missing values #480

Merged
merged 10 commits into from
Dec 14, 2023
Merged

Conversation

wrigleyDan
Copy link
Contributor

This PR is an updated version of #452

Work was done by @lechipatrick, improvement spotted by @nathancday, improvement reviewed by @styrmis

With this change XGBoost can handle missing feature values in NaiveAdditiveDecisionTree.

@wrigleyDan wrigleyDan merged commit 39024f6 into main Dec 14, 2023
1 check passed
@styrmis
Copy link
Contributor

styrmis commented Jan 11, 2024

@wrigleyDan Thanks again for getting this merged—do we know when it might appear in a release? It looks like the last release was a few days before this was merged.

@wrigleyDan
Copy link
Contributor Author

@styrmis I wanted to release this together with upgrading the plugin to ES 8.11.x.
The latter proves to be less straight forward than other upgrades but we'll hopefully resolve that soon. If not we can cut a release for the latest improvements to not make that too dependent on the upgrade.

@styrmis
Copy link
Contributor

styrmis commented Jan 22, 2024

Thanks @wrigleyDan. We have been looking again at this issue of inference parity between the plugin and XGBoost and have discovered that as the NaiveAdditiveDecisionTree is a DenseLtrRanker, it fills in missing values with 0. As such the only way currently that we can achieve parity in terms of inference between training and Elasticsearch/production is to fill in missing values in the training data with 0 also.

I've opened an issue for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants