Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix irreproducible inference due to the imprecise floating point values in xgboost's get_dump #500

Merged
merged 10 commits into from
Nov 4, 2024

Conversation

pltb
Copy link
Contributor

@pltb pltb commented Jul 23, 2024

Tries to address the issues of using the output of the get_dump method in xgboost as the import format by parsing the output of save_model instead

More context here #497

TODO:

  • parse using ObjectParser
  • add more tests
  • review from the plugin's authors

@pltb pltb changed the title [DRAFT] fix xgboost import [DRAFT] Allow loading xgboost models from the json output of the save_model method Jul 23, 2024
@pltb pltb force-pushed the xgboost-fix-import branch from 00ccff3 to 95376f5 Compare July 30, 2024 16:07
@pltb pltb changed the title [DRAFT] Allow loading xgboost models from the json output of the save_model method Fix irreproducible inference due to the imprecise floating point values in xgboost's get_dump Jul 30, 2024
@pltb pltb marked this pull request as ready for review July 30, 2024 18:46
@wrigleyDan
Copy link
Contributor

wrigleyDan commented Oct 1, 2024

Summary of our chat at Haystack:

  • Rename the parser to model/xgboost+json+raw
  • Add the switch-case for the additional model parser
  • Preserve existing functionality
  • Add tests
  • Documentation (Daniel)
  • Add notebook to hello-ltr Github repo (Daniel)

@pltb
Copy link
Contributor Author

pltb commented Oct 1, 2024

@wrigleyDan a small correction: shall we rename it to model/xgboost+json+raw instead of v2?

@wrigleyDan
Copy link
Contributor

@wrigleyDan a small correction: shall we rename it to model/xgboost+json+raw instead of v2?

Thanks, copied the old name and forgot to change it. Well spotted, corrected it in the comment above.

@pltb
Copy link
Contributor Author

pltb commented Oct 5, 2024

Hi @wrigleyDan, I've pushed a few changes that should cover the code-related action points that we've outlined:

  • added 8 tests for some basic parsing cases (I'd love to see more extensive testing here, but I'm not sure if this should be a blocker for merging this version of the parser)
  • added some extra validation and refactored the code a little bit
  • renamed the parser like we've agreed
  • registered the parser in the plugin

Please let me know if I can provide any assistance with the docs and/or the notebook, I'd be glad to help :D

@wrigleyDan
Copy link
Contributor

wrigleyDan commented Oct 8, 2024

Thanks for making the changes so quickly @pltb!
There are a couple of tasks failing when executing the tests around checkStyle:

> Task :checkstyleTest
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:56: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:115: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:173: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:227: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:282: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:336: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:397: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:458: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:511: Line is longer than 140 characters (found 152). [LineLength]

> Task :checkstyleTest FAILED

> Task :checkstyleMain
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:8:34: Using the '.*' form of import should be avoided - org.elasticsearch.xcontent.*. [AvoidStarImport]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:12:17: Using the '.*' form of import should be avoided - java.util.*. [AvoidStarImport]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:42: Line is longer than 140 characters (found 156). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:66: Line is longer than 140 characters (found 149). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:120: Line is longer than 140 characters (found 159). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:121: Line is longer than 140 characters (found 178). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:170: Line is longer than 140 characters (found 155). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:177:9: Redundant 'public' modifier. [RedundantModifier]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:197: Line is longer than 140 characters (found 149). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:217:9: Redundant 'public' modifier. [RedundantModifier]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:248:9: Redundant 'public' modifier. [RedundantModifier]

> Task :checkstyleMain FAILED

Could you have a look and resolve these so we don't see any checkStyle violations when running the tests?

Thanks!

@pltb
Copy link
Contributor Author

pltb commented Oct 9, 2024

@wrigleyDan ah I totally missed it 😅
Fixed, thanks!

@wrigleyDan wrigleyDan merged commit 4e39eb8 into o19s:main Nov 4, 2024
1 check passed
@wrigleyDan
Copy link
Contributor

Thanks for the contribution and thanks for your patience @pltb
Merging this into main.
I will create two follow-up tasks:

  • in this repo one to add documentation for this raw XGBoost parser
  • in the hello-ltr repo to think about adding an example notebook

If you want to take a stab at them feel free to do so and/or let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants