Add vendor assessment agent with structured output by aureliensibiril · Pull Request #982 · getprobo/probo

aureliensibiril · 2026-04-02T14:52:02Z

Summary

Add composable Toolset interface, TypedTool[In, Out] factory, and result helpers to pkg/agent for reusable agent tooling
Build multi-agent vendor vetting orchestrator with 16 specialized sub-agents (crawler, security, compliance, data processing, AI risk, incident response, etc.)
Add browser, security, and search toolsets with SSRF protection for agent use
Define structured output types for all sub-agents with JSON schema enforcement via agent.WithOutputType
Wire ResponseFormat into the Anthropic provider (OutputConfig.Format), enabling API-level structured output — previously silently dropped
Add JSON validation in agent-as-tool execution path, returning error tool results on invalid output
Update all sub-agent prompts with explicit JSON output format sections
Wire vendor assessment into the service layer with GraphQL mutation, MCP tool, and CLI command
Remove task priority and rank across schema, UI, MCP, and tests

Changes by area

Agent framework (`pkg/agent`)

Toolset API with WithToolsets(...)
TypedTool[In, Out] with ResultJSON/ResultError(f) helpers
WithOutputType(...) for structured JSON output enforcement
WithThinking(...) for extended thinking budget
Agent-as-tool now validates JSON output before returning to parent

LLM providers (`pkg/llm`)

Anthropic: wire ResponseFormat → OutputConfig.Format (JSON schema enforcement)
OpenAI: already supported (no change)
Both providers support file parts (PDF, CSV)

Vendor assessment (`pkg/agents/vetting`)

16 typed output structs with jsonschema tags in output_types.go
All sub-agents use WithOutputType for schema-enforced JSON responses
Orchestrator coordinates sub-agents with parallel tool calls and progress reporting
Prompts updated with JSON output format examples

Task priority removal

Remove rank field from Task schema and UI
Remove priority-based drag-and-drop reordering logic
Clean up unused imports and dead code

Test plan

make test MODULE=./pkg/agent/... — all tests pass
make test MODULE=./pkg/agents/vetting/... — output type schema tests pass
go vet / go build — clean on all affected packages
Verify vendor assessment end-to-end with run.sh

pkg/agent/tools/security/ssl.go

pkg/agent/agent.go

Introduce the initial vendor assessor plumbing in the service layer (pkg/probo/vendor_service.go, service.go), bootstrap (builder, probod, llm_config), root CLI wiring and the assessVendor GraphQL mutation resolver. Teach the LLM provider layer to carry file parts (PDF, CSV) so downstream vetting sub-agents can hand documents to the model instead of raw text. Both Anthropic and OpenAI providers learn the new part shape through pkg/llm/{part,message,chat}.go and their respective provider adapters. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Introduce a multi-agent system that evaluates third-party vendors against compliance, security, and privacy criteria. Agent framework additions: - ResultJSON, ResultError, ResultErrorf result helpers - TypedTool[In, Out] with auto-marshaled output - Toolset interface with CollectTools and MergeToolsets - WithToolsets option for declarative tool assembly Tool packages (pkg/agent/tools/): - browser: navigate, extract, click, PDF, sitemap, robots - security: SSL, headers, DMARC, SPF, DNSSEC, CSP, CORS, WHOIS, DNS records, HIBP breach check - search: web search, government DB, Wayback, document diff - internal/netcheck: SSRF protection for all tools Orchestrator with 16 specialized sub-agents for crawling, security assessment, compliance, market presence, data processing, AI risk, and more. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Define typed output structs for all 16 vetting sub-agents in a dedicated output_types.go file, with JSON schema tags for API-level enforcement. Replace the old CrawlResult struct with the richer CrawlerOutput type. Add tests verifying schema generation for all output types. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Add WithOutputType to all 16 sub-agents, enforcing JSON schema on every LLM response. Increase max turns and add thinking budgets to match the deeper analysis required by structured output. Update orchestrator to handle fallible sub-agent constructors and enrich tool descriptions with JSON field details. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Replace free-form output sections in all sub-agent prompts with explicit JSON schema examples matching the Go output types. Add structured JSON guidance to the orchestrator base prompt. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Wire ResponseFormat from the LLM request to Anthropic's OutputConfig.Format, enabling API-level structured output enforcement. The schema was previously generated but silently dropped by the Anthropic provider. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

When a sub-agent has an output type, validate that its response is valid JSON before returning it to the parent agent. Invalid output is returned as an error tool result with a truncated preview, allowing the parent to retry or handle gracefully. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Add automatic streaming fallback in blockingCallLLM when Anthropic requires it for large max_tokens or thinking. Fix tool call index tracking in Anthropic stream adapter by using inToolUse flag instead of checking ContentBlock type on content_block_stop events. Propagate thinking signature through MessageDelta so StreamAccumulator can capture it for multi-turn conversations. Default tool_use input to empty object when JSON unmarshal fails. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

When a sub-agent has an output type set and the model returns end_turn with only thinking content (no text), drop the empty assistant turn from history and retry. Anthropic rejects requests where the last message is a thinking-only assistant turn, so the empty message must be removed before continuing. Bounded by maxEmptyOutputRetries (2) to avoid burning through maxTurns when the model consistently fails to produce text. Logs include the retry counter, turn number, and output token count to aid debugging. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Replace string-matching of "streaming is required" in the agent loop with a typed llm.ErrStreamingRequired that the Anthropic provider returns from mapError on HTTP 400 with the matching message body. The agent loop checks via errors.As, so the behaviour is no longer coupled to the SDK's exact wording. Also reset the empty-output retry counter on tool-call turns so it tracks consecutive empty turns rather than total empty turns over the whole run. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Extract resolveAgentClient on Implm to remove the three copy-pasted blocks that resolve an agent's effective config, look up its provider, and build the LLM client. Each call site now uses one line per agent. Also use vetting.DefaultMaxTokens instead of an inline magic constant for the vendor-assessor max-tokens fallback. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Replace 16 copy-pasted constructor functions with a single generic newSubAgent[T any] builder driven by per-agent subAgentSpec values. Each spec captures the agent name, output type name, embedded prompt, max turns, optional thinking budget, and parallel-tool-calls flag. Document the orchestrator's max-turns and thinking-budget choices with named constants explaining why they are sized the way they are. Net delta: ~700 lines removed, behavior unchanged. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The Anthropic SDK refuses non-streaming requests client-side when the expected response time would exceed 10 minutes (large max_tokens or model-specific non-streaming token limits). It returns a plain fmt.Errorf, not an *anthropic.Error, so the errors.As check in mapError missed it and the streaming fallback in blockingCallLLM was never invoked. Match on the message string before the type assertion so both the SDK pre-flight error and any future server-side variant are wrapped as ErrStreamingRequired. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The 19 *.txt prompt files were crowding the package directory alongside the Go source files. Move them into prompts/ and drop the redundant _prompt suffix from the filename. Update all //go:embed directives to reference the new paths. No code logic changes; verified build, vet, and tests pass. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The JSON example block, intro sentence, and Field reference list in each sub-agent prompt duplicated information already enforced by the structured output schema. Anthropic's structured outputs auto-inject a system prompt describing the schema, so the manual JSON example was paying for the same information twice. Per Anthropic's prompt engineering guide: > "The Structured Outputs feature is designed specifically to > constrain Claude's responses to follow a given schema. Try > simply asking the model to conform to your output structure > first, as newer models can reliably match complex schemas > when told to." Replace the deleted blocks with a one-line "## Output" pointer explaining that the schema is enforced by the API and the agent should focus on the substance of the assessment. Preserve any "## Important" sections that came after the deleted block. Net change: 646 lines removed, behaviour unchanged. Per-field guidance lives in the jsonschema struct tags in output_types.go, which Anthropic includes in the auto-injected schema prompt. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The vetting extraction prompt used to carry per-field instructions in prose (enum values, URL purposes, risk rating scales). Those instructions now live directly on the struct via jsonschema tags, so the schema the API enforces is the single source of truth and the extraction prompt can shrink to a short stub. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The previous extraction prompt was 103 lines of per-field instructions duplicating what the VendorInfo struct now exposes through jsonschema tags. The prompt was the source of drift — any field added to VendorInfo without a corresponding prompt update was silently extracted as empty. Replacing it with a short stub that tells the model to trust the API schema closes that gap. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Wrap the orchestrator base prompt in canonical role/task/workflow tags and strip the duplicated per-tool description list. The Anthropic API already delivers tool descriptions to the model via the tools parameter; keeping them in the system prompt wasted tokens and drifted against the real tool definitions. Soften the "MUST shape your investigation" language in the default procedure and wrap the classification and investigation triggers in XML tags. The free-text report template stays as markdown because the orchestrator's final output is markdown. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Each sub-agent prompt is restructured into canonical role / task / assessment / edge_cases / output tags. Prescriptive numbered "Strategy" lists are trimmed to directional sentences so the model is not over-constrained on how to investigate. Six prompts (compliance, regulatory_compliance, analyzer, ai_risk, incident_response, code_security) gain worked few-shot examples for the rating decisions that were most ambiguous in the baseline output. Four prompts (subprocessor, security, financial_stability, regulatory_compliance) gain a self_check block listing the machine-checkable invariants the model should verify before emitting output. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Enforcing a JSON schema on every turn causes models with extended thinking to stuff planning prose into the first text field of the schema as a scratchpad and burn the entire max_tokens budget on thinking-inside-JSON before ever producing a valid object. Once the budget is exhausted the sub-agent returns malformed or empty JSON and the orchestrator has to work around the hole. When the agent has both tools and a structured output request, the loop now runs in exploration mode with no schema enforcement and no tool_choice override. Once the model signals finish_reason stop, the loop promotes the next iteration into a synthesis turn: the exploration message is kept in history (dropped if empty), a user nudge is appended, tool_choice is forced to none, and the schema is enforced. The model converts what it has gathered into JSON in one shot without any scratchpad fight. Agents without tools or without a structured output request are untouched. The empty-output retry path is preserved as a safety net for the synthesis turn itself. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The core loop allocates two constants that describe framework-wide behaviour rather than loop-private invariants: the empty-output retry budget and the synthesis-turn user nudge. Move both to the package-level const block next to tracerName so they live with the other framework tunables. Extract the structured output resolution into resolveStructuredFormat to keep the loop body focused on the state machine. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The Category and VendorType jsonschema tags used to carry their allowed values as a ~350-character prose list because Go struct tags must be compile-time string literals and jsonschema-go only reads them as free-form descriptions. That was unreadable in the source and left the API free to accept any string from the model. Introduce vendorCategoryEnum and vendorTypeEnum slices as the single Go source of truth and decorate the generated schema at extractVendorInfo time: after NewOutputType[VendorInfo] builds the base schema, walk it and attach proper enum arrays on the category and vendor_type properties. The LLM now receives a strict enum constraint, the struct tags shrink to short human descriptions, and a white-box test pins the decoration to the canonical slices. Group DefaultMaxTokens and AssessmentTimeout into a single const block while we are in the neighbourhood. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

The output_types test suite was a white-box package importing only exported symbols and asserting nothing beyond a nil error from NewOutputType. Switch to the black-box vetting_test package and assert the generated schema actually describes an object with a non-empty properties map, so a broken jsonschema tag that silently produces an empty schema now fails the test. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

progressHooks and subProgressHooks had near-identical OnToolStart and OnToolEnd bodies; the only difference was that the sub-variant attached a ParentStep to the emitted event. Collapse both into a single progressHooks struct with an optional parentStep field (empty for the orchestrator-level case) and expose newProgressHooks / newSubProgressHooks as two thin constructors. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Agent.toolsets, WithToolsets, the resolveTools loop and the helper types Toolset, ToolsetFunc, ToolSlice and MergeToolsets have zero callers. Every place that wants tools from a stateful toolset already calls NewXxxToolset(state).Tools() and feeds the result into agent.WithTools, which appends directly to the single tools slice. Drop the dead indirection. CollectTools and the per-package Toolset wrapper structs (which actually carry state) stay. Also drop the BuildTools / BuildReadOnlyTools helpers in the browser and security tool packages: they only existed to feed the now-removed WithToolsets path and have no callers. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

aureliensibiril · 2026-04-08T10:09:01Z

@cubic-dev-ai review this PR

cubic-dev-ai · 2026-04-08T10:09:12Z

@cubic-dev-ai review this PR

@aureliensibiril I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

11 issues found across 101 files

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed. We prioritized the most important files first.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="pkg/agents/vetting/prompts/code_security.txt">

<violation number="1" location="pkg/agents/vetting/prompts/code_security.txt:73">
P2: Example outputs are not valid JSON, conflicting with the enforced JSON schema and likely causing invalid outputs when the model follows the examples.</violation>
</file>

<file name="pkg/agents/vetting/prompts/analyzer.txt">

<violation number="1" location="pkg/agents/vetting/prompts/analyzer.txt:66">
P2: Examples contradict the JSON output requirement; they show semicolon-delimited text instead of valid JSON, which can lead the model to emit invalid JSON and fail schema enforcement.</violation>
</file>

<file name="pkg/agents/vetting/prompts/compliance.txt">

<violation number="1" location="pkg/agents/vetting/prompts/compliance.txt:45">
P2: Examples contradict the JSON‑schema requirement and show non‑JSON output formats, which can lead the model to emit invalid JSON and fail schema validation.</violation>
</file>

<file name="pkg/agent/tools/browser/fetch_robots.go">

<violation number="1" location="pkg/agent/tools/browser/fetch_robots.go:92">
P2: Disallow parsing lowercases the entire line and uses the lowercased remainder as the path, which changes case-sensitive URL paths and can misreport disallowed entries.</violation>
</file>

<file name="pkg/agent/tools/search/diff_documents.go">

<violation number="1" location="pkg/agent/tools/search/diff_documents.go:66">
P2: Oversized documents are incorrectly reported as having no differences, and the tool suppresses the "too large" diagnostic output.</violation>
</file>

<file name="pkg/agent/tools/internal/netcheck/netcheck.go">

<violation number="1" location="pkg/agent/tools/internal/netcheck/netcheck.go:34">
P1: `IsPublicIP` does not block all multicast addresses; it only blocks link-local multicast, allowing other multicast ranges to be treated as public.</violation>
</file>

<file name="pkg/agent/tools/browser/click.go">

<violation number="1" location="pkg/agent/tools/browser/click.go:57">
P1: Click-triggered navigation is not revalidated, allowing bypass of initial domain/URL SSRF checks.</violation>
</file>

<file name="pkg/agents/vetting/prompts/ai_risk.txt">

<violation number="1" location="pkg/agents/vetting/prompts/ai_risk.txt:69">
P2: Examples under `<examples>` are not valid JSON despite the prompt requiring schema-enforced JSON output; the semicolon-separated format can bias the model toward invalid JSON and break strict validation.</violation>
</file>

<file name="pkg/agents/vetting/prompts/incident_response.txt">

<violation number="1" location="pkg/agents/vetting/prompts/incident_response.txt:59">
P2: Examples contradict the JSON output requirement; they use a semicolon-delimited key:value list rather than valid JSON, which can cause the model to emit invalid output for the enforced schema.</violation>
</file>

<file name="pkg/agent/tools/browser/extract_text.go">

<violation number="1" location="pkg/agent/tools/browser/extract_text.go:70">
P2: Text size is capped only after full-page extraction, so huge pages can still cause large transfer/allocation overhead before truncation.</violation>
</file>

<file name="pkg/agents/vetting/assessment.go">

<violation number="1" location="pkg/agents/vetting/assessment.go:196">
P2: Research browser is created without any allowed-domain restriction; browser tool permits any http/https URL when no allowedDomains are set, enabling SSRF-style access to internal endpoints.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

pkg/agent/tools/internal/netcheck/netcheck.go

pkg/agent/tools/browser/click.go

pkg/agents/vetting/prompts/code_security.txt

pkg/agents/vetting/prompts/analyzer.txt

pkg/agents/vetting/prompts/compliance.txt

pkg/agent/tools/search/diff_documents.go

pkg/agents/vetting/prompts/ai_risk.txt

pkg/agents/vetting/prompts/incident_response.txt

pkg/agent/tools/browser/extract_text.go

pkg/agents/vetting/assessment.go

Three reinforcements on the browser navigation path, all surfaced by cubic code review on PR getprobo#982: - netcheck.IsPublicIP now rejects the full multicast range (ip.IsMulticast) rather than only link-local multicast, so addresses in 224.0.0.0/4 and 239.0.0.0/8 can no longer slip through the SSRF guard. - Browser.checkURL now runs netcheck.ValidatePublicURL on every URL, even when no allowed-domain list is set. The research browser in the vendor assessment is intentionally allowed to roam the public web, but it must still refuse URLs that resolve to loopback, private, or link-local IPs. - ClickElementTool reads the post-click location and feeds it back through Browser.checkURL. A click that triggers navigation to a different host (JS-initiated redirect, malicious <a href>, vendor page hijack) used to extract text from whatever page the browser ended up on; that path could bypass the initial checkURL call and read internal endpoints. The post-click revalidation closes that gap. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Three defects flagged by cubic code review on PR getprobo#982: - fetch_robots_txt lowercased the entire Disallow line before reading the path value, corrupting case-sensitive paths (e.g. /Admin/ reported as /admin/). Match the sitemap handling and read the path off the original-case raw line. - extract_page_text pulled the full document.body.innerText over the DevTools protocol before truncating on the Go side, so a huge page could burn bandwidth and memory well beyond maxTextLength. Slice the string in JS at 4x maxTextLength code units first (safe upper bound for UTF-16 code units per Go rune) before transferring, then finish the rune-exact truncation in Go. - diff_documents silently dropped the "documents too large for detailed diff" message when either side exceeded the 5000-line LCS cap, returning HasDifferences=false and an empty UnifiedDiff. Add a tooLarge flag on the internal diffOutput and surface the message via ErrorDetail so the caller can distinguish "no differences" from "too large to compare". Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

CodeQL flagged InsecureSkipVerify in check_ssl_certificate on PR getprobo#982. The tool is a cert INSPECTOR: we intentionally connect to servers whose certificates may be expired, self-signed, or otherwise invalid because reporting on that state is the entire purpose of the tool. The handshake's built-in verification is disabled, then the code manually runs x509.Verify on the returned chain and reports the result in the Valid field. No credentials or confidential data are ever sent over the connection. Document the intent inline and add a //nolint:gosec directive so the scanner stops flagging this path. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Few-shot <example> blocks in six vetting sub-agent prompts (analyzer, compliance, code_security, ai_risk, incident_response, regulatory_compliance) used a semicolon-delimited "key: value" format in their <output> tags. The actual model output for those agents is enforced as JSON via the OutputType schema, so the examples contradicted the enforced contract and could bias the model toward emitting invalid JSON during the synthesis turn. Convert every example <output> to real JSON matching the sub-agent's output schema. No semantic changes to the examples themselves. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

gearnode · 2026-04-08T11:35:50Z

pkg/agent/run.go

+	// maxEmptyOutputRetries bounds the number of times the core loop
+	// will re-ask the model to produce a structured output after it
+	// returned a thinking-only empty response.
+	maxEmptyOutputRetries = 2


put it as WithX

cubic-dev-ai

1 issue found across 13 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="pkg/agent/tools/browser/click.go">

<violation number="1" location="pkg/agent/tools/browser/click.go:74">
P1: Post-click URL validation is performed only after click-triggered navigation, so SSRF-blocked destinations may still be contacted before rejection.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-08T11:35:57Z

pkg/agent/tools/browser/click.go

+			// <a href>), bypassing the initial checkURL. Reject the
+			// result if the new URL is outside the allowed scope or
+			// resolves to a non-public IP.
+			if postClickURL != "" && postClickURL != p.URL {


P1: Post-click URL validation is performed only after click-triggered navigation, so SSRF-blocked destinations may still be contacted before rejection.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At pkg/agent/tools/browser/click.go, line 74: <comment>Post-click URL validation is performed only after click-triggered navigation, so SSRF-blocked destinations may still be contacted before rejection.</comment> <file context> @@ -56,12 +59,24 @@ func ClickElementTool(b *Browser) (agent.Tool, error) { + // <a href>), bypassing the initial checkURL. Reject the + // result if the new URL is outside the allowed scope or + // resolves to a non-public IP. + if postClickURL != "" && postClickURL != p.URL { + if r := b.checkURL(postClickURL); r != nil { + return *r, nil </file context>

pkg/agent/tools/security/ssl.go

+			dialer := &tls.Dialer{
+				NetDialer: &net.Dialer{Timeout: 10 * time.Second},
+				Config: &tls.Config{
+					InsecureSkipVerify: true, //nolint:gosec // cert inspector; verification happens manually below


gearnode · 2026-04-08T11:37:40Z

pkg/agent/run.go

+				if resp.Message.Text() == "" {
+					s.messages = s.messages[:len(s.messages)-1]
+				}
+				s.messages = append(s.messages, llm.Message{


gearnode · 2026-04-08T11:38:03Z

pkg/agent/run.go

+					Role:  llm.RoleUser,
+					Parts: []llm.Part{llm.TextPart{Text: synthesisNudge}},
+				})
+				s.logger.InfoCtx(


gearnode · 2026-04-08T11:39:56Z

pkg/agent/run.go


 		case llm.FinishReasonToolCalls:
 			s.toolUsedInRun = true
+			emptyOutputRetries = 0


when it's happen?

gearnode · 2026-04-08T11:46:19Z

pkg/agent/typed_tool.go

+	var parsed struct {
+		Required []string `json:"required"`
+	}
+	_ = json.Unmarshal(schema, &parsed)


handle error

The final vendor_info_extractor step used to share the orchestrator's 20-minute AssessmentTimeout context, so a slow orchestrator could leave the extractor with no budget to run. Observed on a Pylon assessment where the orchestrator consumed ~19 minutes of sub-agent work and the extractor then failed immediately with "context deadline exceeded" — losing the full markdown report that had just been produced. Detach the extractor from the assessment context and give it a dedicated 5-minute budget via context.WithoutCancel + a fresh WithTimeout. The extractor has no tools and emits a single structured JSON output, so five minutes is more than enough even when Anthropic forces the streaming path. Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch 3 times, most recently from bde166f to 3c2639a Compare April 2, 2026 15:53

github-advanced-security bot found potential problems Apr 2, 2026

View reviewed changes

pkg/agent/tools/security/ssl.go Fixed Show fixed Hide fixed

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch 2 times, most recently from cd1d947 to 9c38610 Compare April 6, 2026 19:12

aureliensibiril changed the title ~~Add composable agent and tool framework~~ Add vendor assessment agent with structured output Apr 6, 2026

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch 2 times, most recently from ccba179 to d640d96 Compare April 7, 2026 06:03

gearnode reviewed Apr 7, 2026

View reviewed changes

pkg/agent/agent.go Outdated Show resolved Hide resolved

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch 4 times, most recently from 29c89db to a061840 Compare April 8, 2026 09:06

aureliensibiril marked this pull request as ready for review April 8, 2026 10:00

aureliensibiril added 15 commits April 8, 2026 12:01

Fix WithTx callback signature in vendor assessment

52c8358

Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

Fix e2e config for agents key rename

f59dcd2

Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

aureliensibiril added 14 commits April 8, 2026 12:01

Fix gofmt: trailing blank line in vendor_service.go

c0314cc

Signed-off-by: Aurélien Sibiril <81782+aureliensibiril@users.noreply.github.com>

aureliensibiril force-pushed the aureliensibiril/vendor-assessment-agent branch from a061840 to c7ba96e Compare April 8, 2026 10:02

cubic-dev-ai bot reviewed Apr 8, 2026

View reviewed changes

aureliensibiril added 4 commits April 8, 2026 13:28

gearnode reviewed Apr 8, 2026

View reviewed changes

cubic-dev-ai bot reviewed Apr 8, 2026

View reviewed changes

github-advanced-security bot found potential problems Apr 8, 2026

View reviewed changes

pkg/agent/tools/security/ssl.go

dialer := &tls.Dialer{

NetDialer: &net.Dialer{Timeout: 10 * time.Second},

Config: &tls.Config{

InsecureSkipVerify: true, //nolint:gosec // cert inspector; verification happens manually below

gearnode reviewed Apr 8, 2026

View reviewed changes

pkg/agent/run.go

case llm.FinishReasonToolCalls:

s.toolUsedInRun = true

emptyOutputRetries = 0

Copy link
Copy Markdown

Contributor

gearnode Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when it's happen?

gearnode reviewed Apr 8, 2026

View reviewed changes

Conversation

aureliensibiril commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes by area

Agent framework (pkg/agent)

LLM providers (pkg/llm)

Vendor assessment (pkg/agents/vetting)

Task priority removal

Test plan

Uh oh!

Uh oh!

Uh oh!

aureliensibiril commented Apr 8, 2026

Uh oh!

cubic-dev-ai bot commented Apr 8, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gearnode Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gearnode Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gearnode Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gearnode Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gearnode Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aureliensibiril commented Apr 2, 2026 •

edited

Loading

Agent framework (`pkg/agent`)

LLM providers (`pkg/llm`)

Vendor assessment (`pkg/agents/vetting`)

cubic-dev-ai bot Apr 8, 2026 •

edited

Loading