Skip to content

Fix/xgboost multitarget tree info#761

Open
christian-git-md wants to merge 3 commits into
onnx:mainfrom
christian-git-md:fix/xgboost-multitarget-tree-info
Open

Fix/xgboost multitarget tree info#761
christian-git-md wants to merge 3 commits into
onnx:mainfrom
christian-git-md:fix/xgboost-multitarget-tree-info

Conversation

@christian-git-md

Copy link
Copy Markdown

fixes issue #676

Also fixes conversion with reg:quantileerror with multiple quantile_alpha values, which was the problem I ran into.

@christian-git-md christian-git-md force-pushed the fix/xgboost-multitarget-tree-info branch from 9f3937b to bac0f11 Compare March 15, 2026 13:24
XGBoost assigns each tree to an output channel via the tree_info array
(e.g. [0, 1, 0, 1, ...] for 2 targets). The converter was ignoring
this and hardcoding weights_id_bias = 0 on every leaf, routing all
trees into output slot 0.

Changes:
- _fill_node_attributes: accept target_id (default 0), use as weights_id_bias on leaves
- fill_tree_attributes: accept optional tree_info list; derive target_id per tree
- XGBRegressorConverter.convert: read tree_info from booster raw JSON when n_targets > 1
- shape_calculators/Regressor.py: new calculator reading n_targets from get_xgb_params

Also fixes reg:quantileerror with multiple quantile_alpha values (same code path).

Signed-off-by: Christian Holland <christianholland91@googlemail.com>
@christian-git-md christian-git-md force-pushed the fix/xgboost-multitarget-tree-info branch from bac0f11 to 9094be7 Compare March 15, 2026 13:29
@sonarqubecloud

Copy link
Copy Markdown

@christian-git-md

Copy link
Copy Markdown
Author

@xadupre any chance you can have a look at this?

@andife andife requested a review from Copilot April 28, 2026 18:06
@sonarqubecloud

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ONNXMLTools’ XGBoost regressor conversion for multi-target outputs (Issue #676) by correctly propagating the number of targets into shape inference and by mapping per-tree leaf contributions to the right target dimension during conversion.

Changes:

  • Add XGBoost regressor-specific shape calculator using XGBoost config-derived n_targets.
  • Update TreeEnsembleRegressor attribute generation to assign target_id per tree using XGBoost tree_info, and replicate base_values for multi-target.
  • Add regression tests validating multi-target output values and reg:quantileerror with multiple quantile_alpha values.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/xgboost/test_xgboost_issues.py Adds value-level regression tests for multi-target outputs and quantile regression multi-alpha outputs.
onnxmltools/convert/xgboost/shape_calculators/Regressor.py Replaces linear regressor shape calculator with XGBoost-aware n_targets shape inference.
onnxmltools/convert/xgboost/operator_converters/XGBoost.py Uses tree_info to map each tree’s leaves to the correct target dimension and adjusts base_values for multi-target outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +6
import numpy as np
from numpy.testing import assert_allclose

Copilot AI Apr 28, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new top-level import numpy as np, the inner import numpy as np inside test_issue_676 is now redundant. Consider removing the inner import to avoid duplication and keep imports consistent within the test module.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +11
def calculate_xgboost_regressor_output_shapes(operator):
N = operator.inputs[0].type.shape[0]
n_targets = get_xgb_params(operator.raw_operator).get("n_targets", 1)
operator.outputs[0].type = FloatTensorType([N, n_targets])

Copilot AI Apr 28, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new regressor shape calculator no longer validates the expected input/output counts (the previous calculate_linear_regressor_output_shapes did). This can allow silent shape inference issues if the operator is wired incorrectly. Consider calling check_input_and_output_numbers(operator, input_count_range=1, output_count_range=1) (and, if consistent with other calculators, check_input_and_output_types) before setting the output type.

Copilot uses AI. Check for mistakes.
Comment on lines +427 to +430
raw = json.loads(xgb_node.get_booster().save_raw(raw_format="json"))
tree_info = raw["learner"]["gradient_booster"]["model"]["tree_info"]
if best_ntree_limit and best_ntree_limit < len(tree_info):
tree_info = tree_info[:best_ntree_limit]

Copilot AI Apr 28, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tree_info is obtained by calling save_raw(raw_format="json") and then json.loads(...), which parses the entire model a second time (in addition to get_dump(..., dump_format="json")). For large multi-target models this can significantly increase conversion time/memory. Consider extracting only the tree_info array (e.g., via a lightweight parse/regex similar to onnxmltools/convert/xgboost/_parse.py:_get_attributes) or using a deterministic mapping (like round-robin treeid % n_targets) as a fallback when full JSON parsing isn’t necessary.

Suggested change
raw = json.loads(xgb_node.get_booster().save_raw(raw_format="json"))
tree_info = raw["learner"]["gradient_booster"]["model"]["tree_info"]
if best_ntree_limit and best_ntree_limit < len(tree_info):
tree_info = tree_info[:best_ntree_limit]
raw = xgb_node.get_booster().save_raw(raw_format="json")
if isinstance(raw, bytes):
raw = raw.decode("utf-8")
tree_info_key = '"tree_info"'
tree_info = None
key_pos = raw.find(tree_info_key)
if key_pos >= 0:
array_start = raw.find("[", key_pos)
if array_start >= 0:
depth = 0
array_end = -1
for i in range(array_start, len(raw)):
if raw[i] == "[":
depth += 1
elif raw[i] == "]":
depth -= 1
if depth == 0:
array_end = i + 1
break
if array_end > 0:
try:
tree_info = json.loads(raw[array_start:array_end])
except (TypeError, ValueError):
tree_info = None
if tree_info is None:
tree_info = [tree_id % n_targets for tree_id in range(len(js_trees))]
if best_ntree_limit and best_ntree_limit < len(tree_info):
tree_info = tree_info[:best_ntree_limit]
elif len(tree_info) != len(js_trees):
tree_info = [tree_id % n_targets for tree_id in range(len(js_trees))]

Copilot uses AI. Check for mistakes.
@andife

andife commented Jun 10, 2026

Copy link
Copy Markdown
Member

@xadupre ready to merge?
@christian-git-md Are you planing to adress the copilot findings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants