Fix/xgboost multitarget tree info by christian-git-md · Pull Request #761 · onnx/onnxmltools

christian-git-md · 2026-03-15T13:21:16Z

fixes issue #676

Also fixes conversion with reg:quantileerror with multiple quantile_alpha values, which was the problem I ran into.

XGBoost assigns each tree to an output channel via the tree_info array (e.g. [0, 1, 0, 1, ...] for 2 targets). The converter was ignoring this and hardcoding weights_id_bias = 0 on every leaf, routing all trees into output slot 0. Changes: - _fill_node_attributes: accept target_id (default 0), use as weights_id_bias on leaves - fill_tree_attributes: accept optional tree_info list; derive target_id per tree - XGBRegressorConverter.convert: read tree_info from booster raw JSON when n_targets > 1 - shape_calculators/Regressor.py: new calculator reading n_targets from get_xgb_params Also fixes reg:quantileerror with multiple quantile_alpha values (same code path). Signed-off-by: Christian Holland <christianholland91@googlemail.com>

sonarqubecloud · 2026-03-15T13:30:28Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

christian-git-md · 2026-03-20T22:13:46Z

@xadupre any chance you can have a look at this?

sonarqubecloud · 2026-04-28T18:07:26Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot

Pull request overview

Fixes ONNXMLTools’ XGBoost regressor conversion for multi-target outputs (Issue #676) by correctly propagating the number of targets into shape inference and by mapping per-tree leaf contributions to the right target dimension during conversion.

Changes:

Add XGBoost regressor-specific shape calculator using XGBoost config-derived n_targets.
Update TreeEnsembleRegressor attribute generation to assign target_id per tree using XGBoost tree_info, and replicate base_values for multi-target.
Add regression tests validating multi-target output values and reg:quantileerror with multiple quantile_alpha values.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
tests/xgboost/test_xgboost_issues.py	Adds value-level regression tests for multi-target outputs and quantile regression multi-alpha outputs.
onnxmltools/convert/xgboost/shape_calculators/Regressor.py	Replaces linear regressor shape calculator with XGBoost-aware `n_targets` shape inference.
onnxmltools/convert/xgboost/operator_converters/XGBoost.py	Uses `tree_info` to map each tree’s leaves to the correct target dimension and adjusts `base_values` for multi-target outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T18:12:42Z

+import numpy as np
+from numpy.testing import assert_allclose


With the new top-level import numpy as np, the inner import numpy as np inside test_issue_676 is now redundant. Consider removing the inner import to avoid duplication and keep imports consistent within the test module.

Copilot · 2026-04-28T18:12:42Z

+def calculate_xgboost_regressor_output_shapes(operator):
+    N = operator.inputs[0].type.shape[0]
+    n_targets = get_xgb_params(operator.raw_operator).get("n_targets", 1)
+    operator.outputs[0].type = FloatTensorType([N, n_targets])


The new regressor shape calculator no longer validates the expected input/output counts (the previous calculate_linear_regressor_output_shapes did). This can allow silent shape inference issues if the operator is wired incorrectly. Consider calling check_input_and_output_numbers(operator, input_count_range=1, output_count_range=1) (and, if consistent with other calculators, check_input_and_output_types) before setting the output type.

Copilot · 2026-04-28T18:12:42Z

+            raw = json.loads(xgb_node.get_booster().save_raw(raw_format="json"))
+            tree_info = raw["learner"]["gradient_booster"]["model"]["tree_info"]
+            if best_ntree_limit and best_ntree_limit < len(tree_info):
+                tree_info = tree_info[:best_ntree_limit]


tree_info is obtained by calling save_raw(raw_format="json") and then json.loads(...), which parses the entire model a second time (in addition to get_dump(..., dump_format="json")). For large multi-target models this can significantly increase conversion time/memory. Consider extracting only the tree_info array (e.g., via a lightweight parse/regex similar to onnxmltools/convert/xgboost/_parse.py:_get_attributes) or using a deterministic mapping (like round-robin treeid % n_targets) as a fallback when full JSON parsing isn’t necessary.

Suggested change

raw = json.loads(xgb_node.get_booster().save_raw(raw_format="json"))

tree_info = raw["learner"]["gradient_booster"]["model"]["tree_info"]

if best_ntree_limit and best_ntree_limit < len(tree_info):

tree_info = tree_info[:best_ntree_limit]

raw = xgb_node.get_booster().save_raw(raw_format="json")

if isinstance(raw, bytes):

raw = raw.decode("utf-8")

tree_info_key = '"tree_info"'

tree_info = None

key_pos = raw.find(tree_info_key)

if key_pos >= 0:

array_start = raw.find("[", key_pos)

if array_start >= 0:

depth = 0

array_end = -1

for i in range(array_start, len(raw)):

if raw[i] == "[":

depth += 1

elif raw[i] == "]":

depth -= 1

if depth == 0:

array_end = i + 1

break

if array_end > 0:

try:

tree_info = json.loads(raw[array_start:array_end])

except (TypeError, ValueError):

tree_info = None

if tree_info is None:

tree_info = [tree_id % n_targets for tree_id in range(len(js_trees))]

if best_ntree_limit and best_ntree_limit < len(tree_info):

tree_info = tree_info[:best_ntree_limit]

elif len(tree_info) != len(js_trees):

tree_info = [tree_id % n_targets for tree_id in range(len(js_trees))]

andife · 2026-06-10T16:11:24Z

@xadupre ready to merge?
@christian-git-md Are you planing to adress the copilot findings?

christian-git-md force-pushed the fix/xgboost-multitarget-tree-info branch from 9f3937b to bac0f11 Compare March 15, 2026 13:24

christian-git-md force-pushed the fix/xgboost-multitarget-tree-info branch from bac0f11 to 9094be7 Compare March 15, 2026 13:29

andife requested a review from Copilot April 28, 2026 18:06

Merge branch 'main' into fix/xgboost-multitarget-tree-info

82f900a

Copilot started reviewing on behalf of andife April 28, 2026 18:06 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

xadupre approved these changes May 19, 2026

View reviewed changes

Merge branch 'main' into fix/xgboost-multitarget-tree-info

b5c77cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/xgboost multitarget tree info#761

Fix/xgboost multitarget tree info#761
christian-git-md wants to merge 3 commits into
onnx:mainfrom
christian-git-md:fix/xgboost-multitarget-tree-info

christian-git-md commented Mar 15, 2026

Uh oh!

sonarqubecloud Bot commented Mar 15, 2026

Uh oh!

christian-git-md commented Mar 20, 2026

Uh oh!

sonarqubecloud Bot commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

andife commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-            raw = json.loads(xgb_node.get_booster().save_raw(raw_format="json"))
-            tree_info = raw["learner"]["gradient_booster"]["model"]["tree_info"]
-            if best_ntree_limit and best_ntree_limit < len(tree_info):
-                tree_info = tree_info[:best_ntree_limit]
+            raw = xgb_node.get_booster().save_raw(raw_format="json")
+            if isinstance(raw, bytes):
+                raw = raw.decode("utf-8")
+            tree_info_key = '"tree_info"'
+            tree_info = None
+            key_pos = raw.find(tree_info_key)
+            if key_pos >= 0:
+                array_start = raw.find("[", key_pos)
+                if array_start >= 0:
+                    depth = 0
+                    array_end = -1
+                    for i in range(array_start, len(raw)):
+                        if raw[i] == "[":
+                            depth += 1
+                        elif raw[i] == "]":
+                            depth -= 1
+                            if depth == 0:
+                                array_end = i + 1
+                                break
+                    if array_end > 0:
+                        try:
+                            tree_info = json.loads(raw[array_start:array_end])
+                        except (TypeError, ValueError):
+                            tree_info = None
+            if tree_info is None:
+                tree_info = [tree_id % n_targets for tree_id in range(len(js_trees))]
+            if best_ntree_limit and best_ntree_limit < len(tree_info):
+                tree_info = tree_info[:best_ntree_limit]
+            elif len(tree_info) != len(js_trees):
+                tree_info = [tree_id % n_targets for tree_id in range(len(js_trees))]

Conversation

christian-git-md commented Mar 15, 2026

Uh oh!

sonarqubecloud Bot commented Mar 15, 2026

Quality Gate passed

Uh oh!

christian-git-md commented Mar 20, 2026

Uh oh!

sonarqubecloud Bot commented Apr 28, 2026

Quality Gate passed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

andife commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants