-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-reproducible XGBoost ranking #497
Comments
On which ES / Java version did you observe this behaviour? Just to make sure it's not < Java 17 and you are not observing the behaviour that is described here in the context of |
Hi @renekrie, thanks for your reply =) |
I just noticed that the XGBoost example in the plugin's docs uses the
I have a hunch that this get_dump method may be dumping the numbers as higher-precision decimals instead of float32s... (which, considering ltr plugin's trimming to float32 during parsing, may produce an essentially different model when loaded by the ltr plugin). |
I'm wondering if it's possible for the LTR plugin to load the model from the output of this xgboost method: https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.Booster.save_model |
Okay, I've confirmed my theory =) Here's the complete summary of the issue that we observe right now: The model training code (as well as the plugin's docs) uses the get_dump xgboost method to save the trained model, however the XGBoost docs state that it's primarily supposed to be used for visualization or interpretation (and it won't be even able to load it back from this dump). The docs say that for dumping models for later inference, save_model should be used instead. I've checked the way the model's split thresholds (the
As you can see, the number from the get_dump method has more significant decimal digits than the one from save_model – and it's also true for the rest of the model. Of course this kind of difference may lead to an essentially different model being loaded by the LTR plugin (even despite the fact that it parses the numbers as 32-bit floats), and this may definitely be the reason why we were seeing different scores. Those extra digits at the end simply shouldn't be there, the original model doesn't have them (which is logical because XGBoost operates with float32 ). If we want the scores to be reproducible, we should ensure that the model that we load in ES shouldn't contain the extra digits that we see in the get_dump's result. The problem now is that the output of the "proper" save_model model saving method can't be read by the LTR plugin. |
We ran tests with our own model and the implementation I've posted above, apparently it fixes the score mismatches that we've been having cc @wrigleyDan |
Thank you for your work and insights @pltb. To be honest, I'm not aware of the differences in format between |
Thank you for your response @renekrie!
Unfortunately it cannot, because the structure of JSON is entirely different (it's flat, you can see the json schema here). I wish we were able to somehow reuse/adapt the original parser, but unfortunately it seems like introducing a different parser (and exposing it via separate content type in the API) is the easier way to address the issue |
@renekrie I've cleaned the code up a bit, now it should be more clear why the parser can't be simply reused. |
Thank you @pltb ! I will have a look and then we'll need to figure out how we deal with it. I think he |
@renekrie thank you! |
@renekrie let me know if you'll need my help with that 👍 |
We've noticed some unexpected xgboost model behavior when comparing the scoring in the ES+LTR runtime to what the original inference (python + pandas + xgboost) gave us.
The likely cause for the discrepancies was this: since every feature that the model gets is a function score, the model will receive float32-typed features in the ES LTR runtime (which means they would have up to about 7 decimal digits), but our model's splitting thresholds are numbers with 9 or more decimal digits.
We thought that it is important to mention this explicitly in the plugin's documentation, because it's not obvious, and may lead to unpredictable results.
For more context please see the following:
script_score
float precision limit elastic/elasticsearch#49402The text was updated successfully, but these errors were encountered: