Skip to content

fix(json): extract structured payloads from noisy model output#576

Open
Niko96-dotcom wants to merge 1 commit intoplastic-labs:mainfrom
Niko96-dotcom:fix/json-parser-hardening
Open

fix(json): extract structured payloads from noisy model output#576
Niko96-dotcom wants to merge 1 commit intoplastic-labs:mainfrom
Niko96-dotcom:fix/json-parser-hardening

Conversation

@Niko96-dotcom
Copy link
Copy Markdown

@Niko96-dotcom Niko96-dotcom commented Apr 18, 2026

Summary

  • strip reasoning wrappers before repair without corrupting already-valid JSON
  • prefer tagged JSON fences over generic fenced blocks
  • collect and rank balanced JSON candidates instead of trusting the first blob that looks vaguely object-shaped
  • add regression tests for fenced JSON preference, schema-vs-payload selection, and literal <think> text inside valid JSON

Test Plan

  • set -a && source /Users/nikolaymohr/honcho/.env && set +a && uv run pytest tests/utils/test_json_parser.py -q

Why

OpenAI-compatible providers keep wrapping structured responses in junk. The old path was too trusting and regularly grabbed the wrong payload.

Summary by CodeRabbit

  • New Features

    • Improved JSON payload extraction with enhanced handling of noisy input, fenced code blocks, and embedded JSON structures for increased parsing robustness.
  • Tests

    • Added comprehensive test coverage for JSON extraction and validation scenarios, including edge cases and various formatting styles.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Walkthrough

A JSON extraction pipeline was added to preprocess noisy input by removing <think> tags, extracting from code fences, and identifying valid JSON payloads before validation and repair. The validation function now uses this pipeline to normalize input before repair operations.

Changes

Cohort / File(s) Summary
JSON Extraction Pipeline
src/utils/json_parser.py
Added extract_json_payload() function that trims input, removes <think>...</think> wrappers, prioritizes fenced \``jsonblocks, and extracts balanced JSON objects/arrays with validity ranking. Updatedvalidate_and_repair_json()` to preprocess input through this pipeline before repair logic.
Test Coverage
tests/utils/test_json_parser.py
New test module validating extraction behavior: removal of think tags, fenced block extraction, preference for actual payloads over schema-like objects, preservation of literal <think> in string values, and array payload selection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hoppy whiskers twitch with JSON delight,
Noisy <think> tags removed from sight!
Fences and braces, balanced and bright,
Payloads extracted with algorithmic might,
Validation flows smooth—oh what a sight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(json): extract structured payloads from noisy model output' directly and clearly describes the main change: improving JSON payload extraction from model output containing extraneous text.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/json_parser.py`:
- Around line 14-19: The current _strip_reasoning_wrappers uses _THINK_TAG_RE to
remove all <think>...</think> occurrences across the whole text which can
corrupt JSON payloads (e.g., inside string values); update
_strip_reasoning_wrappers to only remove leading/trailing reasoning wrappers
(anchor the pattern to the start and/or end) or, better, move wrapper removal to
after JSON candidate extraction so functions that extract JSON (the JSON
candidate extraction routine referenced in the file) operate on the original
text; specifically, modify _THINK_TAG_RE and _strip_reasoning_wrappers (and the
JSON extraction flow that calls it) so tags inside extracted JSON strings are
preserved while only outer/leading wrappers are stripped.
- Around line 22-32: _fenced_candidates currently loses the priority information
and only returns bodies, allowing _best_valid_json_candidate to pick a
higher-scored generic fence over a later tagged JSON fence; modify
_fenced_candidates to preserve and return the priority with each body (e.g.,
list of (priority, body)) and update _best_valid_json_candidate to prefer lower
priority (tagged JSON) first when selecting among valid JSON candidates—for
example, by sorting/choosing based on (priority, -score) or by grouping by
priority and only comparing scores within the same priority.
- Around line 35-76: The scanner _collect_balanced_json_candidates can get stuck
when a noisy unmatched "{" or "[" appears before a valid JSON because it never
resets start on a mismatched closure; update the loop so that when a closing
char is seen that doesn't match stack[-1] (or stack is empty) you discard the
current candidate by setting start = None and stack = [], then re-evaluate the
same character as a potential opener (if char in "{[" then set start = i and
push the matching closer). Also ensure escaped chars and in_string handling
remain unchanged and let unterminated candidates fall away at end-of-text.
- Around line 86-90: The current scoring gives schema-shaped payloads a higher
primary score (has_schema_signal -> 2), letting long schema markers beat real
payloads; change the primary score logic so schema-shaped candidates are
penalized before length tiebreaking (e.g., set primary = 0 when
has_schema_signal is true and 2 otherwise). Update the return in the block that
computes keys/schema_keys/has_schema_signal (variables payload, keys,
schema_keys, has_schema_signal, candidate) to return the inverted primary score
so schema-like objects lose priority before comparing len(keys) and
len(candidate).

In `@tests/utils/test_json_parser.py`:
- Around line 24-45: Add regression tests covering ambiguous candidate cases for
extract_json_payload: ensure fenced ```json blocks override earlier non-json
fences, literal <think>…</think> content is preserved inside valid/noisy JSON,
unclosed prefix objects are ignored and later valid payloads are chosen, and
that an "Actual" payload with keys like "explicit" wins over a schema-shaped
"Schema" candidate; update tests in tests/utils/test_json_parser.py by adding
cases that exercise these behaviors and reference extract_json_payload (and
indirectly src/utils/json_parser.py) so the parser’s precedence and preservation
rules are validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1345eb18-6ba8-46c5-99b2-59873b1eea6e

📥 Commits

Reviewing files that changed from the base of the PR and between 9676526 and c564885.

📒 Files selected for processing (2)
  • src/utils/json_parser.py
  • tests/utils/test_json_parser.py

Comment thread src/utils/json_parser.py
Comment on lines +14 to +19
_THINK_TAG_RE = re.compile(r"<think>.*?</think>", re.IGNORECASE | re.DOTALL)


def _strip_reasoning_wrappers(text: str) -> str:
"""Remove common reasoning/prose wrappers before JSON extraction."""
return _THINK_TAG_RE.sub("", text.strip()).strip()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid stripping <think> tags inside extracted JSON strings.

Because stripping runs before candidate extraction, noisy output like Here: {"content":"literal <think>x</think>"} will silently mutate the payload content. Limit stripping to leading reasoning wrappers, or extract JSON candidates before applying wrapper removal.

🐛 Proposed fix
-_THINK_TAG_RE = re.compile(r"<think>.*?</think>", re.IGNORECASE | re.DOTALL)
+_LEADING_THINK_TAG_RE = re.compile(
+    r"^\s*(?:<think>.*?</think>\s*)+",
+    re.IGNORECASE | re.DOTALL,
+)
 def _strip_reasoning_wrappers(text: str) -> str:
     """Remove common reasoning/prose wrappers before JSON extraction."""
-    return _THINK_TAG_RE.sub("", text.strip()).strip()
+    return _LEADING_THINK_TAG_RE.sub("", text.strip()).strip()

Also applies to: 121-127

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/json_parser.py` around lines 14 - 19, The current
_strip_reasoning_wrappers uses _THINK_TAG_RE to remove all <think>...</think>
occurrences across the whole text which can corrupt JSON payloads (e.g., inside
string values); update _strip_reasoning_wrappers to only remove leading/trailing
reasoning wrappers (anchor the pattern to the start and/or end) or, better, move
wrapper removal to after JSON candidate extraction so functions that extract
JSON (the JSON candidate extraction routine referenced in the file) operate on
the original text; specifically, modify _THINK_TAG_RE and
_strip_reasoning_wrappers (and the JSON extraction flow that calls it) so tags
inside extracted JSON strings are preserved while only outer/leading wrappers
are stripped.

Comment thread src/utils/json_parser.py
Comment on lines +22 to +32
def _fenced_candidates(text: str) -> list[str]:
candidates: list[str] = []
for match in _JSON_FENCE_RE.finditer(text):
body = match.group("body").strip()
if not body:
continue
lang = (match.group("lang") or "").strip().lower()
priority = 0 if lang == "json" else 1
candidates.append((priority, body))
candidates.sort(key=lambda item: item[0])
return [body for _, body in candidates]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce JSON-fence priority before candidate scoring.

_fenced_candidates() only sorts tagged json fences first, but _best_valid_json_candidate() can still select a stronger-scored generic fence over a later tagged JSON fence. That violates the “prefer tagged JSON fences” behavior when both fenced blocks contain valid JSON.

🐛 Proposed fix
-def _fenced_candidates(text: str) -> list[str]:
+def _fenced_candidates(text: str, *, json_only: bool = False) -> list[str]:
     candidates: list[str] = []
     for match in _JSON_FENCE_RE.finditer(text):
         body = match.group("body").strip()
         if not body:
             continue
         lang = (match.group("lang") or "").strip().lower()
-        priority = 0 if lang == "json" else 1
-        candidates.append((priority, body))
-    candidates.sort(key=lambda item: item[0])
-    return [body for _, body in candidates]
+        if json_only and lang != "json":
+            continue
+        if not json_only and lang == "json":
+            continue
+        candidates.append(body)
+    return candidates
-    fenced_valid = _best_valid_json_candidate(_fenced_candidates(cleaned))
+    fenced_valid = _best_valid_json_candidate(
+        _fenced_candidates(cleaned, json_only=True)
+    )
+    if fenced_valid:
+        return fenced_valid
+
+    fenced_valid = _best_valid_json_candidate(_fenced_candidates(cleaned))
     if fenced_valid:
         return fenced_valid

Also applies to: 129-131

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/json_parser.py` around lines 22 - 32, _fenced_candidates currently
loses the priority information and only returns bodies, allowing
_best_valid_json_candidate to pick a higher-scored generic fence over a later
tagged JSON fence; modify _fenced_candidates to preserve and return the priority
with each body (e.g., list of (priority, body)) and update
_best_valid_json_candidate to prefer lower priority (tagged JSON) first when
selecting among valid JSON candidates—for example, by sorting/choosing based on
(priority, -score) or by grouping by priority and only comparing scores within
the same priority.

Comment thread src/utils/json_parser.py
Comment on lines +35 to +76
def _collect_balanced_json_candidates(text: str) -> list[str]:
"""Return all balanced JSON object/array substrings, in encounter order."""
candidates: list[str] = []
start = None
stack: list[str] = []
in_string = False
escape = False

for i, char in enumerate(text):
if start is None:
if char not in "[{":
continue
start = i
stack.append("}" if char == "{" else "]")
continue

if escape:
escape = False
continue

if char == "\\":
escape = True
continue

if char == '"':
in_string = not in_string
continue

if in_string:
continue

if char == "{":
stack.append("}")
elif char == "[":
stack.append("]")
elif stack and char == stack[-1]:
stack.pop()
if not stack and start is not None:
candidates.append(text[start : i + 1].strip())
start = None

return candidates
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resync after malformed leading candidates.

If noisy prose contains an unmatched { or [ before the real payload, this scanner never resets start, so later valid JSON is missed entirely. Discard the current candidate on mismatched closure or scan from each opener independently.

🐛 Example regression to add
def test_extract_json_payload_ignores_unclosed_prefix_object():
    raw = 'debug {not closed\nActual: {"explicit":[]}'
    assert json.loads(extract_json_payload(raw)) == {"explicit": []}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/json_parser.py` around lines 35 - 76, The scanner
_collect_balanced_json_candidates can get stuck when a noisy unmatched "{" or
"[" appears before a valid JSON because it never resets start on a mismatched
closure; update the loop so that when a closing char is seen that doesn't match
stack[-1] (or stack is empty) you discard the current candidate by setting start
= None and stack = [], then re-evaluate the same character as a potential opener
(if char in "{[" then set start = i and push the matching closer). Also ensure
escaped chars and in_string handling remain unchanged and let unterminated
candidates fall away at end-of-text.

Comment thread src/utils/json_parser.py
Comment on lines +86 to +90
if isinstance(payload, dict):
keys = set(payload)
schema_keys = {"explicit", "deductive", "inductive", "observations", "facts"}
has_schema_signal = int(bool(keys & schema_keys))
return (2 if has_schema_signal else 0, len(keys), len(candidate))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Penalize schema-shaped candidates before ranking by size.

A schema such as {"explicit":{"type":"array","items":{"type":"object"}}} scores as a payload because it has explicit, and its longer length can beat the actual {"explicit":[]} payload. Add a schema-marker penalty before using length as a tiebreaker.

🐛 Proposed direction
     if isinstance(payload, dict):
         keys = set(payload)
-        schema_keys = {"explicit", "deductive", "inductive", "observations", "facts"}
-        has_schema_signal = int(bool(keys & schema_keys))
-        return (2 if has_schema_signal else 0, len(keys), len(candidate))
+        payload_keys = {"explicit", "deductive", "inductive", "observations", "facts"}
+        schema_marker_keys = {
+            "$schema",
+            "type",
+            "properties",
+            "required",
+            "items",
+            "additionalProperties",
+        }
+        has_payload_signal = int(bool(keys & payload_keys))
+        looks_like_schema = int(
+            bool(keys & schema_marker_keys)
+            or any(isinstance(value, dict) and "type" in value for value in payload.values())
+        )
+        return (2 if has_payload_signal else 0, -looks_like_schema, len(keys), len(candidate))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/json_parser.py` around lines 86 - 90, The current scoring gives
schema-shaped payloads a higher primary score (has_schema_signal -> 2), letting
long schema markers beat real payloads; change the primary score logic so
schema-shaped candidates are penalized before length tiebreaking (e.g., set
primary = 0 when has_schema_signal is true and 2 otherwise). Update the return
in the block that computes keys/schema_keys/has_schema_signal (variables
payload, keys, schema_keys, has_schema_signal, candidate) to return the inverted
primary score so schema-like objects lose priority before comparing len(keys)
and len(candidate).

Comment on lines +24 to +45
def test_extract_json_payload_prefers_json_fence_over_earlier_non_json_fence():
raw = '```text\nnot json\n```\n```json\n{"explicit":[{"content":"I live in Berlin"}]}\n```'
extracted = extract_json_payload(raw)
assert json.loads(extracted) == {"explicit": [{"content": "I live in Berlin"}]}


def test_extract_json_payload_prefers_payload_over_earlier_schema_object():
raw = 'Schema: {"type":"object"}\nActual: {"explicit":[{"content":"I use Cubase"}]}'
extracted = extract_json_payload(raw)
assert json.loads(extracted) == {"explicit": [{"content": "I use Cubase"}]}


def test_extract_json_payload_preserves_literal_think_text_inside_valid_json():
raw = '{"content":"literal <think> tag","explicit":[]}'
extracted = extract_json_payload(raw)
assert json.loads(extracted) == {"content": "literal <think> tag", "explicit": []}


def test_extract_json_payload_prefers_later_array_payload_over_earlier_schema_dict():
raw = 'Schema: {"type":"object"}\nActual: [{"content":"I use Cubase"}]'
extracted = extract_json_payload(raw)
assert json.loads(extracted) == [{"content": "I use Cubase"}]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add regressions for the remaining ambiguous-candidate cases.

Current tests cover the intended path, but they would not catch the fence-priority, literal <think>...</think> corruption, unclosed-prefix, or schema-shaped explicit candidate failures flagged in src/utils/json_parser.py.

🧪 Suggested tests
+def test_extract_json_payload_prefers_json_fence_over_valid_generic_fence():
+    raw = (
+        '```\n{"explicit":[{"content":"generic"}]}\n```\n'
+        '```json\n{"type":"object"}\n```'
+    )
+    assert json.loads(extract_json_payload(raw)) == {"type": "object"}
+
+
+def test_extract_json_payload_preserves_literal_think_text_inside_noisy_json():
+    raw = 'Here: {"content":"literal <think>tag</think>","explicit":[]}'
+    extracted = extract_json_payload(raw)
+    assert json.loads(extracted) == {
+        "content": "literal <think>tag</think>",
+        "explicit": [],
+    }
+
+
+def test_extract_json_payload_ignores_unclosed_prefix_object():
+    raw = 'debug {not closed\nActual: {"explicit":[]}'
+    assert json.loads(extract_json_payload(raw)) == {"explicit": []}
+
+
+def test_extract_json_payload_prefers_payload_over_schema_with_payload_keys():
+    raw = (
+        'Schema: {"explicit":{"type":"array","items":{"type":"object"}}}\n'
+        'Actual: {"explicit":[]}'
+    )
+    assert json.loads(extract_json_payload(raw)) == {"explicit": []}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_json_parser.py` around lines 24 - 45, Add regression tests
covering ambiguous candidate cases for extract_json_payload: ensure fenced
```json blocks override earlier non-json fences, literal <think>…</think>
content is preserved inside valid/noisy JSON, unclosed prefix objects are
ignored and later valid payloads are chosen, and that an "Actual" payload with
keys like "explicit" wins over a schema-shaped "Schema" candidate; update tests
in tests/utils/test_json_parser.py by adding cases that exercise these behaviors
and reference extract_json_payload (and indirectly src/utils/json_parser.py) so
the parser’s precedence and preservation rules are validated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant