feat(api): Wiki Compose P4 — Ingest への research subgraph 共有 (#952)#968
Conversation
- Register graph id ingest-planner with prepare_ingest → P1 research nodes → plan_ingest - Reuse research nodes, shouldRefine, and ZediChatModel (no duplicate tools) - Add POST /api/ingest/graph/run and /graph/resume with TSDoc vs /api/ingest/plan - Extract shouldRefine for shared conditional routing - Vitest: ingest graph wiring and research HITL resume path Co-authored-by: Akimasa Sugai <otomatty@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds an ingest-planner StateGraph with IngestPlannerState, nodes (prepare, research formatting, plan), extracts shouldRefine, registers the graph at app startup, exposes run/resume HTTP endpoints with checkpoint support, refactors ingest-plan parsing, and adds unit/integration tests for prompt formatting and interrupt/resume flows. ChangesIngest Planner Graph and Workflow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the ingest-planner graph, which integrates a shared research loop into the article ingestion workflow. It includes new state definitions, nodes for preparation and planning, and API endpoints for running and resuming the graph. Feedback focuses on critical validation gaps for the excerpt field in both the API routes and graph nodes, which could lead to runtime crashes. Additionally, the reviewer pointed out that the research loop results are currently ignored during the final planning phase, and noted an inefficient serialization cycle in the planIngest node.
| if (!article?.title?.trim() || !article.url?.trim()) { | ||
| throw new Error("prepare_ingest: article { title, url, excerpt } is required"); | ||
| } |
There was a problem hiding this comment.
The validation check for article is missing a check for the excerpt field, even though the error message on the next line explicitly states it is required. This omission leads to a potential runtime crash at line 55 when attempting to call .slice() on an undefined or null excerpt.
| if (!article?.title?.trim() || !article.url?.trim()) { | |
| throw new Error("prepare_ingest: article { title, url, excerpt } is required"); | |
| } | |
| if (!article?.title?.trim() || !article.url?.trim() || !article.excerpt?.trim()) { | |
| throw new Error("prepare_ingest: article { title, url, excerpt } is required"); | |
| } |
| : candidates | ||
| .map( | ||
| (c, i) => | ||
| `[${i + 1}] id=${c.id}\n title: ${c.title}\n excerpt: ${c.excerpt.slice(0, 400)}`, |
There was a problem hiding this comment.
Accessing c.excerpt.slice() without a null/undefined check is unsafe here. Since the candidates input comes from an external API request, it is not guaranteed that every candidate will have a valid excerpt string. This could cause the graph execution to fail.
| `[${i + 1}] id=${c.id}\n title: ${c.title}\n excerpt: ${c.excerpt.slice(0, 400)}`, | |
| `[${i + 1}] id=${c.id}\n title: ${c.title}\n excerpt: ${(c.excerpt ?? "").slice(0, 400)}`, |
| if (!body.article || typeof body.article.title !== "string" || !body.article.url?.trim()) { | ||
| throw new HTTPException(400, { message: "article { title, url, excerpt } is required" }); | ||
| } |
There was a problem hiding this comment.
The API request validation for body.article should include a check for the excerpt field. Since the prepare_ingest node (and the error message here) expects it to be present, validating it at the entry point prevents downstream crashes when the graph attempts to process the input.
| if (!body.article || typeof body.article.title !== "string" || !body.article.url?.trim()) { | |
| throw new HTTPException(400, { message: "article { title, url, excerpt } is required" }); | |
| } | |
| if (!body.article || typeof body.article.title !== "string" || !body.article.url?.trim() || typeof body.article.excerpt !== "string") { | |
| throw new HTTPException(400, { message: "article { title, url, excerpt } is required" }); | |
| } |
| const messages = buildIngestPlannerPrompt({ | ||
| article: state.article, | ||
| candidates: state.candidates, | ||
| userSchema: state.userSchema ?? undefined, | ||
| }); |
There was a problem hiding this comment.
The plan_ingest node currently ignores the results of the research loop. It calls buildIngestPlannerPrompt with only the initial article and candidates, but does not incorporate state.approvedResearch or the compiled state.batches. This means the final ingest plan is generated without considering any of the information gathered during the research phase, which defeats the purpose of running the research loop in this graph. You should ensure the research findings are included in the prompt context.
| const raw = await structured.invoke(messages.map((m) => ({ role: m.role, content: m.content }))); | ||
|
|
||
| const validCandidateIds = new Set(state.candidates.map((c) => c.id)); | ||
| const ingestPlan = parseIngestPlanResponse(JSON.stringify(raw), { validCandidateIds }); |
There was a problem hiding this comment.
This line performs a redundant serialization and deserialization cycle. raw is already a structured object returned by withStructuredOutput. Calling JSON.stringify(raw) just to pass it to parseIngestPlanResponse (which likely calls JSON.parse internally) is inefficient. If parseIngestPlanResponse is only needed for the validCandidateIds validation, consider refactoring that logic to avoid the overhead. Additionally, there is no check to ensure raw is not null before processing.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30bab43920
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!body.article || typeof body.article.title !== "string" || !body.article.url?.trim()) { | ||
| throw new HTTPException(400, { message: "article { title, url, excerpt } is required" }); | ||
| } | ||
|
|
||
| const candidates = Array.isArray(body.candidates) ? body.candidates : []; |
There was a problem hiding this comment.
Validate graph-run input fields before invoking the graph
/graph/run only checks article.title and article.url, and only verifies candidates is an array, but prepare_ingest later calls article.excerpt.slice(...) and c.excerpt.slice(...) unconditionally. If a client sends article.excerpt (or candidate excerpt) as missing/non-string, the request fails with a runtime TypeError and returns a 500 instead of a client-correctable 400. Please validate the full input shape (including excerpt/id/title string fields) at the route boundary.
Useful? React with 👍 / 👎.
| if (result.status === "failed") { | ||
| throw new HTTPException(500, { message: result.error ?? "Graph resume failed" }); |
There was a problem hiding this comment.
Return client error for invalid resume payloads
When human_review_research rejects a malformed resume payload (e.g. missing/invalid approvedSourceIds), GraphRunner.resume reports status: "failed", but this handler always maps that to HTTP 500. That misclassifies user input errors as server faults and makes callers retry an unrecoverable request. This path should return a 4xx for validation failures so clients can fix the payload.
Useful? React with 👍 / 👎.
…to plan_ingest - Validate article.excerpt and candidate excerpt at /graph/run and prepare_ingest - Append approvedResearch + batch evaluation to plan_ingest prompt - parseIngestPlanValue for structured output without JSON round-trip - Map graph validation failures to HTTP 400 on run/resume Co-authored-by: Akimasa Sugai <otomatty@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
server/api/src/routes/ingest.ts (1)
324-364: ⚡ Quick winAdd Japanese counterparts for the new ingest-graph TSDoc blocks.
Line 324-364 and Line 455-457 are English-only docs; this file otherwise follows bilingual API documentation style.
As per coding guidelines "Include both Japanese and English comments/documentation in code and documentation files to maintain project tone consistency".
Also applies to: 455-457
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/api/src/routes/ingest.ts` around lines 324 - 364, The TSDoc blocks for the ingest graph endpoints are English-only; add Japanese translations alongside the existing English text for the IngestGraphRunBody and IngestGraphResumeBody declarations and the POST /api/ingest/graph/run (and its resume paragraph) comment so the file follows the bilingual style. Locate the comment block documenting POST /api/ingest/graph/run (the paragraph describing integration, flow, and resume) and the interface comments for IngestGraphRunBody and IngestGraphResumeBody, and insert concise Japanese counterparts directly above or below each English sentence/section, keeping the original English text intact and matching tone/terminology (e.g., INGEST_PLANNER_GRAPH_ID, human_review_research, threadId, resume schema).server/api/src/agents/graphs/ingest/ingestPlannerGraph.ts (1)
13-13: ⚡ Quick winImport
shouldRefinedirectly from the predicate module.Pulling
shouldRefinefromresearchGraph.tscouples this graph to another graph module. Import from../../subgraphs/research/shouldRefine.jsto keep dependency direction explicit and narrower.♻️ Suggested change
-import { shouldRefine } from "../../subgraphs/research/researchGraph.js"; +import { shouldRefine } from "../../subgraphs/research/shouldRefine.js";🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/api/src/agents/graphs/ingest/ingestPlannerGraph.ts` at line 13, The import of shouldRefine in ingestPlannerGraph.ts currently comes from "../../subgraphs/research/researchGraph.js", which couples this module to the whole research graph; update the import to pull shouldRefine directly from "../../subgraphs/research/shouldRefine.js" (replace the existing import statement referencing researchGraph.js with one that imports shouldRefine from the predicate module) so the dependency is narrower and directionally correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/api/src/__tests__/agents/graphs/ingest/ingestPlannerGraph.test.ts`:
- Around line 1-3: The file-level header comment for the ingest planner graph
test is English-only; update the top-of-file comment (the block comment
containing "Ingest planner graph (`#952`) — research subgraph wiring + routing
tests.") to include a Japanese translation alongside the English text so the
header contains both JA and EN (e.g., first line in Japanese, second line in
English) to satisfy the repository guideline requiring bilingual comments.
In `@server/api/src/agents/graphs/ingest/ingestPlannerGraph.ts`:
- Around line 27-30: Add Japanese translations alongside the English comments
for the two exported constants INGEST_PLANNER_GRAPH_ID and
INGEST_PLANNER_GRAPH_VERSION (e.g., "Registered graph id. / 登録されたグラフのID。") and
likewise update the other bilingual-missing comment block later in this file
that documents the ingest planner graph metadata (the comments near the graph
registration/metadata section) so all code comments are in both English and
Japanese to satisfy the repository guideline.
In `@server/api/src/agents/graphs/ingest/nodes/prepareIngest.ts`:
- Around line 31-47: The candidate formatting uses c.excerpt.slice(...) without
guarding its type which can throw if a candidate has a non-string excerpt;
update the runtime guard in the mapping used to build candidateBlock (the map
over state.candidates in prepareIngest.ts) to ensure excerpt is a string before
slicing (e.g., use a conditional check like typeof c.excerpt === "string" ?
c.excerpt : "" or coerce safely) and apply .slice(0,400) only to that safe
string; this prevents runtime exceptions when building candidateBlock and keeps
the rest of prepareIngest (and clampMaxIterations/state handling) unchanged.
In `@server/api/src/routes/ingest.ts`:
- Around line 463-468: The JSON parsing currently sets `body:
IngestGraphResumeBody` but doesn't validate that `body.resume` exists before
calling `runner.resume`; add an explicit presence check for the `resume` payload
after `body = await c.req.json<IngestGraphResumeBody>()` (and before any use
such as the call to `runner.resume`) and throw an `HTTPException(400, { message:
"Missing required 'resume' payload" })` if it's absent or falsy; update any
other spot that calls `runner.resume` (e.g., around the code referenced at the
later 518-519 use) to perform the same validation so `runner.resume` never
receives an undefined `resume` value.
- Around line 383-385: The current assignment of threadId (const threadId =
typeof body.threadId === "string" && body.threadId.trim() ? body.threadId.trim()
: randomUUID();) lets clients supply raw IDs that get passed to LangGraph as
thread_id, enabling cross-user collisions; change this to compute a namespaced
thread id based on the current user (e.g., combine or hash the authenticated
user identifier and the client-supplied or generated id) and use that namespaced
value wherever threadId is used (including the other occurrence around the
470-474 block) so thread ids are scoped per-user and collisions/session fixation
across users are prevented. Ensure you reference the authenticated user
identifier used in your request context (e.g., req.user.id or the equivalent)
and update all code paths that set or pass thread_id to LangGraph to use the new
namespaced value.
In `@server/api/src/services/ingestPlanner.ts`:
- Around line 266-272: The JSDoc for the exported function that validates a
parsed ingest plan (parameters named parsed and options, and throws
IngestPlanParseError) is currently English-only; add corresponding Japanese
descriptions for the summary, `@param` parsed, `@param` options, and `@throws` lines
so the block is bilingual (JA/EN) while preserving the existing English text and
tags. Keep the same JSDoc tags and phrasing order, add concise Japanese
translations immediately above or below each English line, and ensure the
exported function's JSDoc remains syntactically valid for TypeScript tooling.
---
Nitpick comments:
In `@server/api/src/agents/graphs/ingest/ingestPlannerGraph.ts`:
- Line 13: The import of shouldRefine in ingestPlannerGraph.ts currently comes
from "../../subgraphs/research/researchGraph.js", which couples this module to
the whole research graph; update the import to pull shouldRefine directly from
"../../subgraphs/research/shouldRefine.js" (replace the existing import
statement referencing researchGraph.js with one that imports shouldRefine from
the predicate module) so the dependency is narrower and directionally correct.
In `@server/api/src/routes/ingest.ts`:
- Around line 324-364: The TSDoc blocks for the ingest graph endpoints are
English-only; add Japanese translations alongside the existing English text for
the IngestGraphRunBody and IngestGraphResumeBody declarations and the POST
/api/ingest/graph/run (and its resume paragraph) comment so the file follows the
bilingual style. Locate the comment block documenting POST /api/ingest/graph/run
(the paragraph describing integration, flow, and resume) and the interface
comments for IngestGraphRunBody and IngestGraphResumeBody, and insert concise
Japanese counterparts directly above or below each English sentence/section,
keeping the original English text intact and matching tone/terminology (e.g.,
INGEST_PLANNER_GRAPH_ID, human_review_research, threadId, resume schema).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9efe355c-14e0-4fe0-8e63-8f29cf3e9132
📒 Files selected for processing (19)
server/api/src/__tests__/agents/graphs/ingest/formatResearchForIngest.test.tsserver/api/src/__tests__/agents/graphs/ingest/ingestPlannerGraph.test.tsserver/api/src/__tests__/services/ingestPlanner.test.tsserver/api/src/agents/core/composeModelConfig.tsserver/api/src/agents/graphs/ingest/index.tsserver/api/src/agents/graphs/ingest/ingestPlannerGraph.tsserver/api/src/agents/graphs/ingest/nodes/formatResearchForIngest.tsserver/api/src/agents/graphs/ingest/nodes/index.tsserver/api/src/agents/graphs/ingest/nodes/planIngest.tsserver/api/src/agents/graphs/ingest/nodes/prepareIngest.tsserver/api/src/agents/graphs/ingest/state.tsserver/api/src/agents/graphs/ingest/types.tsserver/api/src/agents/index.tsserver/api/src/agents/subgraphs/research/nodes/index.tsserver/api/src/agents/subgraphs/research/researchGraph.tsserver/api/src/agents/subgraphs/research/shouldRefine.tsserver/api/src/app.tsserver/api/src/routes/ingest.tsserver/api/src/services/ingestPlanner.ts
…#968) - Prefix checkpoint thread_id with userId; return client threadId in JSON - Require resume payload on /graph/resume - Validate candidate id/title/excerpt types in prepare_ingest - Bilingual JSDoc per project guidelines Co-authored-by: Akimasa Sugai <otomatty@users.noreply.github.com>
Summary
Implements issue #952 (Wiki Compose P4): connect the P1 research loop to the article ingest workflow via a new LangGraph
ingest-planner, without duplicating tools or LLM plumbing.Changes
ingest-plannergraph (server/api/src/agents/graphs/ingest/):prepare_ingest→ shared P1 research nodes (plan_queries,web_search,wiki_search,fetch_articles,evaluate_sufficiency, …) →plan_ingest(ZediChatModel + existingingestPlannerparse logic)registerIngestPlannerGraph()at app bootstrap; exported fromagents/index.tsPOST /api/ingest/graph/runandPOST /api/ingest/graph/resumewith TSDoc describing coexistence withPOST /api/ingest/plan([P1] Ingest フロー: Web Clipper × AI Wiki の統合と merge プレビュー #595)shouldRefine: extracted toshouldRefine.tsfor shared conditional routing across graphsingestPlannerGraph.test.ts(research wiring + HITL resume →plan_ingest); existing ingest service/route tests unchangedAcceptance criteria
How to verify
Notes
POST /api/ingest/planremains the URL-first production path (callProvider);/graph/runexpects client-suppliedarticle+candidates(e.g. after/plan).DATABASE_URL); same research HITL payload aswiki-compose-research.Summary by CodeRabbit
New Features
Tests