fix(json): extract structured payloads from noisy model output by Niko96-dotcom · Pull Request #576 · plastic-labs/honcho

Niko96-dotcom · 2026-04-18T12:59:55Z

Summary

strip reasoning wrappers before repair without corrupting already-valid JSON
prefer tagged JSON fences over generic fenced blocks
collect and rank balanced JSON candidates instead of trusting the first blob that looks vaguely object-shaped
add regression tests for fenced JSON preference, schema-vs-payload selection, and literal <think> text inside valid JSON

Test Plan

set -a && source /Users/nikolaymohr/honcho/.env && set +a && uv run pytest tests/utils/test_json_parser.py -q

Why

OpenAI-compatible providers keep wrapping structured responses in junk. The old path was too trusting and regularly grabbed the wrong payload.

Summary by CodeRabbit

New Features
- Improved JSON payload extraction with enhanced handling of noisy input, fenced code blocks, and embedded JSON structures for increased parsing robustness.
Tests
- Added comprehensive test coverage for JSON extraction and validation scenarios, including edge cases and various formatting styles.

coderabbitai · 2026-04-18T13:00:10Z

Walkthrough

A JSON extraction pipeline was added to preprocess noisy input by removing <think> tags, extracting from code fences, and identifying valid JSON payloads before validation and repair. The validation function now uses this pipeline to normalize input before repair operations.

Changes

Cohort / File(s)	Summary
JSON Extraction Pipeline `src/utils/json_parser.py`	Added `extract_json_payload()` function that trims input, removes `<think>...</think>` wrappers, prioritizes fenced `\```json`blocks, and extracts balanced JSON objects/arrays with validity ranking. Updated`validate_and_repair_json()` to preprocess input through this pipeline before repair logic.
Test Coverage `tests/utils/test_json_parser.py`	New test module validating extraction behavior: removal of think tags, fenced block extraction, preference for actual payloads over schema-like objects, preservation of literal `<think>` in string values, and array payload selection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hoppy whiskers twitch with JSON delight,
Noisy <think> tags removed from sight!
Fences and braces, balanced and bright,
Payloads extracted with algorithmic might,
Validation flows smooth—oh what a sight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(json): extract structured payloads from noisy model output' directly and clearly describes the main change: improving JSON payload extraction from model output containing extraneous text.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/json_parser.py`:
- Around line 14-19: The current _strip_reasoning_wrappers uses _THINK_TAG_RE to
remove all <think>...</think> occurrences across the whole text which can
corrupt JSON payloads (e.g., inside string values); update
_strip_reasoning_wrappers to only remove leading/trailing reasoning wrappers
(anchor the pattern to the start and/or end) or, better, move wrapper removal to
after JSON candidate extraction so functions that extract JSON (the JSON
candidate extraction routine referenced in the file) operate on the original
text; specifically, modify _THINK_TAG_RE and _strip_reasoning_wrappers (and the
JSON extraction flow that calls it) so tags inside extracted JSON strings are
preserved while only outer/leading wrappers are stripped.
- Around line 22-32: _fenced_candidates currently loses the priority information
and only returns bodies, allowing _best_valid_json_candidate to pick a
higher-scored generic fence over a later tagged JSON fence; modify
_fenced_candidates to preserve and return the priority with each body (e.g.,
list of (priority, body)) and update _best_valid_json_candidate to prefer lower
priority (tagged JSON) first when selecting among valid JSON candidates—for
example, by sorting/choosing based on (priority, -score) or by grouping by
priority and only comparing scores within the same priority.
- Around line 35-76: The scanner _collect_balanced_json_candidates can get stuck
when a noisy unmatched "{" or "[" appears before a valid JSON because it never
resets start on a mismatched closure; update the loop so that when a closing
char is seen that doesn't match stack[-1] (or stack is empty) you discard the
current candidate by setting start = None and stack = [], then re-evaluate the
same character as a potential opener (if char in "{[" then set start = i and
push the matching closer). Also ensure escaped chars and in_string handling
remain unchanged and let unterminated candidates fall away at end-of-text.
- Around line 86-90: The current scoring gives schema-shaped payloads a higher
primary score (has_schema_signal -> 2), letting long schema markers beat real
payloads; change the primary score logic so schema-shaped candidates are
penalized before length tiebreaking (e.g., set primary = 0 when
has_schema_signal is true and 2 otherwise). Update the return in the block that
computes keys/schema_keys/has_schema_signal (variables payload, keys,
schema_keys, has_schema_signal, candidate) to return the inverted primary score
so schema-like objects lose priority before comparing len(keys) and
len(candidate).

In `@tests/utils/test_json_parser.py`:
- Around line 24-45: Add regression tests covering ambiguous candidate cases for
extract_json_payload: ensure fenced ```json blocks override earlier non-json
fences, literal <think>…</think> content is preserved inside valid/noisy JSON,
unclosed prefix objects are ignored and later valid payloads are chosen, and
that an "Actual" payload with keys like "explicit" wins over a schema-shaped
"Schema" candidate; update tests in tests/utils/test_json_parser.py by adding
cases that exercise these behaviors and reference extract_json_payload (and
indirectly src/utils/json_parser.py) so the parser’s precedence and preservation
rules are validated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1345eb18-6ba8-46c5-99b2-59873b1eea6e

📥 Commits

Reviewing files that changed from the base of the PR and between 9676526 and c564885.

📒 Files selected for processing (2)

src/utils/json_parser.py
tests/utils/test_json_parser.py

coderabbitai · 2026-04-18T13:04:12Z

+_THINK_TAG_RE = re.compile(r"<think>.*?</think>", re.IGNORECASE | re.DOTALL)
+
+
+def _strip_reasoning_wrappers(text: str) -> str:
+    """Remove common reasoning/prose wrappers before JSON extraction."""
+    return _THINK_TAG_RE.sub("", text.strip()).strip()


⚠️ Potential issue | 🟠 Major

Avoid stripping <think> tags inside extracted JSON strings.

Because stripping runs before candidate extraction, noisy output like Here: {"content":"literal <think>x</think>"} will silently mutate the payload content. Limit stripping to leading reasoning wrappers, or extract JSON candidates before applying wrapper removal.

🐛 Proposed fix

-_THINK_TAG_RE = re.compile(r"<think>.*?</think>", re.IGNORECASE | re.DOTALL) +_LEADING_THINK_TAG_RE = re.compile( + r"^\s*(?:<think>.*?</think>\s*)+", + re.IGNORECASE | re.DOTALL, +)

def _strip_reasoning_wrappers(text: str) -> str: """Remove common reasoning/prose wrappers before JSON extraction.""" - return _THINK_TAG_RE.sub("", text.strip()).strip() + return _LEADING_THINK_TAG_RE.sub("", text.strip()).strip()

Also applies to: 121-127

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/json_parser.py` around lines 14 - 19, The current _strip_reasoning_wrappers uses _THINK_TAG_RE to remove all <think>...</think> occurrences across the whole text which can corrupt JSON payloads (e.g., inside string values); update _strip_reasoning_wrappers to only remove leading/trailing reasoning wrappers (anchor the pattern to the start and/or end) or, better, move wrapper removal to after JSON candidate extraction so functions that extract JSON (the JSON candidate extraction routine referenced in the file) operate on the original text; specifically, modify _THINK_TAG_RE and _strip_reasoning_wrappers (and the JSON extraction flow that calls it) so tags inside extracted JSON strings are preserved while only outer/leading wrappers are stripped.

coderabbitai · 2026-04-18T13:04:12Z

+def _fenced_candidates(text: str) -> list[str]:
+    candidates: list[str] = []
+    for match in _JSON_FENCE_RE.finditer(text):
+        body = match.group("body").strip()
+        if not body:
+            continue
+        lang = (match.group("lang") or "").strip().lower()
+        priority = 0 if lang == "json" else 1
+        candidates.append((priority, body))
+    candidates.sort(key=lambda item: item[0])
+    return [body for _, body in candidates]


⚠️ Potential issue | 🟠 Major

Enforce JSON-fence priority before candidate scoring.

_fenced_candidates() only sorts tagged json fences first, but _best_valid_json_candidate() can still select a stronger-scored generic fence over a later tagged JSON fence. That violates the “prefer tagged JSON fences” behavior when both fenced blocks contain valid JSON.

🐛 Proposed fix

-def _fenced_candidates(text: str) -> list[str]: +def _fenced_candidates(text: str, *, json_only: bool = False) -> list[str]: candidates: list[str] = [] for match in _JSON_FENCE_RE.finditer(text): body = match.group("body").strip() if not body: continue lang = (match.group("lang") or "").strip().lower() - priority = 0 if lang == "json" else 1 - candidates.append((priority, body)) - candidates.sort(key=lambda item: item[0]) - return [body for _, body in candidates] + if json_only and lang != "json": + continue + if not json_only and lang == "json": + continue + candidates.append(body) + return candidates

- fenced_valid = _best_valid_json_candidate(_fenced_candidates(cleaned)) + fenced_valid = _best_valid_json_candidate( + _fenced_candidates(cleaned, json_only=True) + ) + if fenced_valid: + return fenced_valid + + fenced_valid = _best_valid_json_candidate(_fenced_candidates(cleaned)) if fenced_valid: return fenced_valid

Also applies to: 129-131

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/json_parser.py` around lines 22 - 32, _fenced_candidates currently loses the priority information and only returns bodies, allowing _best_valid_json_candidate to pick a higher-scored generic fence over a later tagged JSON fence; modify _fenced_candidates to preserve and return the priority with each body (e.g., list of (priority, body)) and update _best_valid_json_candidate to prefer lower priority (tagged JSON) first when selecting among valid JSON candidates—for example, by sorting/choosing based on (priority, -score) or by grouping by priority and only comparing scores within the same priority.

coderabbitai · 2026-04-18T13:04:12Z

+def _collect_balanced_json_candidates(text: str) -> list[str]:
+    """Return all balanced JSON object/array substrings, in encounter order."""
+    candidates: list[str] = []
+    start = None
+    stack: list[str] = []
+    in_string = False
+    escape = False
+
+    for i, char in enumerate(text):
+        if start is None:
+            if char not in "[{":
+                continue
+            start = i
+            stack.append("}" if char == "{" else "]")
+            continue
+
+        if escape:
+            escape = False
+            continue
+
+        if char == "\\":
+            escape = True
+            continue
+
+        if char == '"':
+            in_string = not in_string
+            continue
+
+        if in_string:
+            continue
+
+        if char == "{":
+            stack.append("}")
+        elif char == "[":
+            stack.append("]")
+        elif stack and char == stack[-1]:
+            stack.pop()
+            if not stack and start is not None:
+                candidates.append(text[start : i + 1].strip())
+                start = None
+
+    return candidates


⚠️ Potential issue | 🟠 Major

Resync after malformed leading candidates.

If noisy prose contains an unmatched { or [ before the real payload, this scanner never resets start, so later valid JSON is missed entirely. Discard the current candidate on mismatched closure or scan from each opener independently.

🐛 Example regression to add

def test_extract_json_payload_ignores_unclosed_prefix_object(): raw = 'debug {not closed\nActual: {"explicit":[]}' assert json.loads(extract_json_payload(raw)) == {"explicit": []}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/json_parser.py` around lines 35 - 76, The scanner _collect_balanced_json_candidates can get stuck when a noisy unmatched "{" or "[" appears before a valid JSON because it never resets start on a mismatched closure; update the loop so that when a closing char is seen that doesn't match stack[-1] (or stack is empty) you discard the current candidate by setting start = None and stack = [], then re-evaluate the same character as a potential opener (if char in "{[" then set start = i and push the matching closer). Also ensure escaped chars and in_string handling remain unchanged and let unterminated candidates fall away at end-of-text.

coderabbitai · 2026-04-18T13:04:12Z

+    if isinstance(payload, dict):
+        keys = set(payload)
+        schema_keys = {"explicit", "deductive", "inductive", "observations", "facts"}
+        has_schema_signal = int(bool(keys & schema_keys))
+        return (2 if has_schema_signal else 0, len(keys), len(candidate))


⚠️ Potential issue | 🟠 Major

Penalize schema-shaped candidates before ranking by size.

A schema such as {"explicit":{"type":"array","items":{"type":"object"}}} scores as a payload because it has explicit, and its longer length can beat the actual {"explicit":[]} payload. Add a schema-marker penalty before using length as a tiebreaker.

🐛 Proposed direction

if isinstance(payload, dict): keys = set(payload) - schema_keys = {"explicit", "deductive", "inductive", "observations", "facts"} - has_schema_signal = int(bool(keys & schema_keys)) - return (2 if has_schema_signal else 0, len(keys), len(candidate)) + payload_keys = {"explicit", "deductive", "inductive", "observations", "facts"} + schema_marker_keys = { + "$schema", + "type", + "properties", + "required", + "items", + "additionalProperties", + } + has_payload_signal = int(bool(keys & payload_keys)) + looks_like_schema = int( + bool(keys & schema_marker_keys) + or any(isinstance(value, dict) and "type" in value for value in payload.values()) + ) + return (2 if has_payload_signal else 0, -looks_like_schema, len(keys), len(candidate))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/json_parser.py` around lines 86 - 90, The current scoring gives schema-shaped payloads a higher primary score (has_schema_signal -> 2), letting long schema markers beat real payloads; change the primary score logic so schema-shaped candidates are penalized before length tiebreaking (e.g., set primary = 0 when has_schema_signal is true and 2 otherwise). Update the return in the block that computes keys/schema_keys/has_schema_signal (variables payload, keys, schema_keys, has_schema_signal, candidate) to return the inverted primary score so schema-like objects lose priority before comparing len(keys) and len(candidate).

coderabbitai · 2026-04-18T13:04:12Z

+def test_extract_json_payload_prefers_json_fence_over_earlier_non_json_fence():
+    raw = '```text\nnot json\n```\n```json\n{"explicit":[{"content":"I live in Berlin"}]}\n```'
+    extracted = extract_json_payload(raw)
+    assert json.loads(extracted) == {"explicit": [{"content": "I live in Berlin"}]}
+
+
+def test_extract_json_payload_prefers_payload_over_earlier_schema_object():
+    raw = 'Schema: {"type":"object"}\nActual: {"explicit":[{"content":"I use Cubase"}]}'
+    extracted = extract_json_payload(raw)
+    assert json.loads(extracted) == {"explicit": [{"content": "I use Cubase"}]}
+
+
+def test_extract_json_payload_preserves_literal_think_text_inside_valid_json():
+    raw = '{"content":"literal <think> tag","explicit":[]}'
+    extracted = extract_json_payload(raw)
+    assert json.loads(extracted) == {"content": "literal <think> tag", "explicit": []}
+
+
+def test_extract_json_payload_prefers_later_array_payload_over_earlier_schema_dict():
+    raw = 'Schema: {"type":"object"}\nActual: [{"content":"I use Cubase"}]'
+    extracted = extract_json_payload(raw)
+    assert json.loads(extracted) == [{"content": "I use Cubase"}]


⚠️ Potential issue | 🟡 Minor

Add regressions for the remaining ambiguous-candidate cases.

Current tests cover the intended path, but they would not catch the fence-priority, literal <think>...</think> corruption, unclosed-prefix, or schema-shaped explicit candidate failures flagged in src/utils/json_parser.py.

🧪 Suggested tests

+def test_extract_json_payload_prefers_json_fence_over_valid_generic_fence(): + raw = ( + '```\n{"explicit":[{"content":"generic"}]}\n```\n' + '```json\n{"type":"object"}\n```' + ) + assert json.loads(extract_json_payload(raw)) == {"type": "object"} + + +def test_extract_json_payload_preserves_literal_think_text_inside_noisy_json(): + raw = 'Here: {"content":"literal <think>tag</think>","explicit":[]}' + extracted = extract_json_payload(raw) + assert json.loads(extracted) == { + "content": "literal <think>tag</think>", + "explicit": [], + } + + +def test_extract_json_payload_ignores_unclosed_prefix_object(): + raw = 'debug {not closed\nActual: {"explicit":[]}' + assert json.loads(extract_json_payload(raw)) == {"explicit": []} + + +def test_extract_json_payload_prefers_payload_over_schema_with_payload_keys(): + raw = ( + 'Schema: {"explicit":{"type":"array","items":{"type":"object"}}}\n' + 'Actual: {"explicit":[]}' + ) + assert json.loads(extract_json_payload(raw)) == {"explicit": []}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/utils/test_json_parser.py` around lines 24 - 45, Add regression tests covering ambiguous candidate cases for extract_json_payload: ensure fenced ```json blocks override earlier non-json fences, literal <think>…</think> content is preserved inside valid/noisy JSON, unclosed prefix objects are ignored and later valid payloads are chosen, and that an "Actual" payload with keys like "explicit" wins over a schema-shaped "Schema" candidate; update tests in tests/utils/test_json_parser.py by adding cases that exercise these behaviors and reference extract_json_payload (and indirectly src/utils/json_parser.py) so the parser’s precedence and preservation rules are validated.

fix(json): extract structured payloads from noisy model output

c564885

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(json): extract structured payloads from noisy model output#576

fix(json): extract structured payloads from noisy model output#576
Niko96-dotcom wants to merge 1 commit intoplastic-labs:mainfrom
Niko96-dotcom:fix/json-parser-hardening

Niko96-dotcom commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

coderabbitai Bot Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Niko96-dotcom commented Apr 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Why

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Niko96-dotcom commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading