Skip to content

fix(eval): trim legacy trajectory span history#1652

Closed
Chibionos wants to merge 1 commit into
UiPath:mainfrom
Chibionos:replacement/pr-1630-clean
Closed

fix(eval): trim legacy trajectory span history#1652
Chibionos wants to merge 1 commit into
UiPath:mainfrom
Chibionos:replacement/pr-1630-clean

Conversation

@Chibionos
Copy link
Copy Markdown
Contributor

Replacement for #1630 because the original branch cannot be updated cleanly from this account: direct push is blocked by repository rules, gh pr update-branch --rebase returns a GitHub GraphQL error, merge commits are disabled, and the branch now conflicts with main's uipath version bump.\n\nChanges:\n- use trace_to_str for legacy trajectory ReadableSpan history to avoid carrying long LLM/system prompt payloads into the evaluator prompt\n- add a regression test for compact tool history\n- pass explicit LegacyTrajectoryEvaluator type metadata in the test so mypy passes\n- bump packages/uipath to 2.10.66 because 2.10.65 is now on main and already published\n\nLocal validation:\n- GITHUB_EVENT_NAME=pull_request BASE_SHA=origin/main HEAD_SHA=HEAD python .github/scripts/check_version_uniqueness.py\n- cd packages/uipath && uv run mypy --config-file pyproject.toml .\n- cd packages/uipath && uv run ruff check .\n- cd packages/uipath && uv run ruff format --check .\n- cd packages/uipath && uv run pytest tests/evaluators/test_legacy_trajectory_evaluator.py\n- cd packages/uipath && uv run pytest

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ebb876ccf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

agent_run_history
)
agent_run_history = str(trajectory_trace.spans)
agent_run_history = trace_to_str(agent_run_history)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-tool history for direct-answer runs

For runs whose trace contains only LLM/CHAIN spans (for example an agent answers directly without invoking any tool), this now replaces {{AgentRunHistory}} with trace_to_str(...), but that helper only emits spans that have a tool.name attribute and otherwise returns an empty string. The legacy trajectory evaluator prompt does not include agent_output, so those direct-answer executions lose all observable run history compared with the previous TrajectoryEvaluationTrace conversion, making the evaluator unable to judge whether the no-tool path matched the expected behavior.

Useful? React with 👍 / 👎.

@Chibionos
Copy link
Copy Markdown
Contributor Author

Superseded by same-repo replacement #1653 so CI has access to the repo token/secrets.

@Chibionos Chibionos closed this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant