Fix XGBRegressor binary:logistic ONNX conversion producing wrong predictions#771
Fix XGBRegressor binary:logistic ONNX conversion producing wrong predictions#771JOSH1024 wants to merge 13 commits into
Conversation
|
hi contributers |
|
@take-cheeze could u please review this pr ? |
|
@JOSH1024 Sorry. Seems I don't have write access to this repository |
|
@take-cheeze thanks for letting me know |
|
@andife it would be wonderful if u could review this pr? |
There was a problem hiding this comment.
Pull request overview
Fixes incorrect ONNX predictions for xgboost.XGBRegressor models using objective="binary:logistic" by aligning the ONNX graph with XGBoost’s logit-space accumulation + sigmoid behavior.
Changes:
- Convert
base_scorefrom probability space to logit space forbinary:logisticwhen populatingTreeEnsembleRegressor.base_values. - Emit an intermediate
TreeEnsembleRegressoroutput and apply aSigmoidnode to produce probability outputs matching XGBoost. - Add a regression test reproducing issue #726 with
subsample < 1.0.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/xgboost/test_xgboost_issues.py | Adds regression test coverage for binary:logistic subsample scenario (but currently contains an invalid import that breaks tests). |
| onnxmltools/convert/xgboost/operator_converters/XGBoost.py | Adjusts XGBRegressor conversion for binary:logistic to use logit base values and an explicit Sigmoid post-processing node. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@andife i have done the things which copilot said |
|
could you fix dco |
|
yes done @andife |
|
@justinchuby i have updated is it ok now ? |
|
@andife Could you please review and merge this PR if you think the changes look correct? Thanks! |
|
@andife i am not understand why is ubuntu and windows checks are fail could u please help me ? |
Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
…ansform regression Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
|
@andife could u check now ? |
…te_existing kwarg Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
|
@andife could u run the checks? |
…core Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
|
@andife could u re run the checks? |
|
@andife is this ok becasue those ubuntu and windows check fails are there in all the merged pr also ?? so if everything else looks good could u please approve it |
|
@xadupre is this ok because those ubuntu and windows check fails are there in all the merged pr also ?? so if everything else looks good could u please approve it |
The old guard skipped remapping any true/false child ID equal to 0, intending to skip dummy placeholders on LEAF nodes. But it also silently skipped valid branch pointers whose child happened to be node 0 (e.g. root's left child after set_new_numbers). This caused broken tree traversal in ONNX, producing wrong predictions. Fix: skip only when the current node is itself a LEAF. Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
|
The 2 failing XGBoost tests (test_xgb0_empty_tree_classifier, test_xgb_classifier_13) on windows-latest py==3.10 xgboost<2 are pre-existing failures unrelated to this change. They are skipped on xgboost>=2 and fail due to an existing compatibility issue between xgboost<2 and onnxruntime==1.16.3 on Windows. |
|
@andife can u check now |
|
@andife ?? |
|
@xadupre Can you review this PR? |
|
@xadupre could u please look into this pr ?? |
|
@xadupre could u pls look into this ?? |
|
@xadupre can u look into this |
|
@onnx/sig-converters-approvers |
|
I'll ping Xavier tomorrow |
|
@justinchuby could u help me here also |
|
@justinchuby ?? |
bb1cba6 to
1b0078c
Compare
…g, base_score logit transform, and TreeEnsembleClassifier label workarounds Signed-off-by: joshua <joshua.abraham@multicorewareinc.com>
Summary
Fixes #726.
XGBRegressormodels usingobjective="binary:logistic"can produce incorrect ONNX predictions when tree outputs are non-zero (for example whensubsample < 1.0).The converter currently uses
base_scoredirectly asbase_valuesin theTreeEnsembleRegressor, which mixes probability-space values with tree outputs that are represented in logit space.This change:
base_scorefrom probability space to logit space forbinary:logistic.TreeEnsembleRegressoroutput.Sigmoidnode after the tree ensemble output.Reproduction
Using the reproducer from #726:
Before this change:
After this change:
Testing
Added regression test:
Executed:
Result:
Also ran:
Observed no additional failures relative to
main.Validation
to:
Related: #758