fix(deriver): normalize PromptRepresentation.explicit string shapes by AmilGael · Pull Request #538 · plastic-labs/honcho

AmilGael · 2026-04-09T21:09:49Z

Summary

Closes #524.

Some OpenAI-compatible structured-output backends occasionally emit PromptRepresentation.explicit as a bare string or a list of strings instead of the declared list[{"content": str}]. Today that causes Pydantic validation to fail and the entire deriver batch falls through to the empty-fallback representation, silently dropping otherwise usable observations.

This PR extends the existing before-validator on PromptRepresentation.explicit (in src/utils/representation.py) to coerce the loose shapes called out in the issue:

Input	Normalized to
`"User works remotely"`	`[{"content": "User works remotely"}]`
`["A", "B"]`	`[{"content": "A"}, {"content": "B"}]`
`["A", {"content": "B"}]`	mixed strings and dicts coexist
`" "` / `""` in a list	dropped
`None`	`[]` (preserved)

Anything that is neither None, str, nor list still passes through to normal Pydantic validation, so genuinely malformed input continues to raise.

Why this, why now

Issue #524 is tightly scoped by its author ("not about vector dimensions, embedding provider config, queue flushing, token accounting, or broad JSON repair policy changes"), and the failure mode is concrete and user-visible: self-hosted Honcho users on OpenAI-compatible providers lose a whole batch of observations whenever the model drifts into a string-valued emission. The fix is a narrow Pydantic normalization at the validator boundary — it doesn't widen the declared schema and doesn't touch any other code path.

What changed

src/utils/representation.py — renamed convert_none_to_empty_list to normalize_explicit and extended it with the string / list-of-string / mixed-list / blank-string rules described above. None -> [] behavior is preserved. Adds cast to the typing import to satisfy basedpyright.
tests/deriver/test_representation_crud.py — adds seven unit tests covering each new variant plus a pass-through test for already well-formed dict input, and a test confirming the existing None -> [] behavior still holds.

Out of scope (intentionally)

Per the issue, this PR does not touch: vector dimensions, embedding provider configuration, queue flushing / backlog claiming, token usage accounting, or broad JSON repair policy changes. All of those deserve their own discussions.

Test plan

uv run pytest tests/deriver/test_representation_crud.py — 10 passed (3 pre-existing + 7 new)
uv run pytest tests/deriver/test_representation_crud.py tests/integration/test_representation.py — 19 passed
uv run ruff check src/utils/representation.py tests/deriver/test_representation_crud.py — clean
uv run ruff format --check src/utils/representation.py tests/deriver/test_representation_crud.py — already formatted
uv run basedpyright src/utils/representation.py tests/deriver/test_representation_crud.py — 0 errors, 0 warnings

By Gamaliel — coworking with Claude Code

Summary by CodeRabbit

New Features
- Enhanced input flexibility: the explicit field now accepts bare strings and lists of mixed strings and dictionaries, with automatic normalization and filtering of blank entries.
Tests
- Added comprehensive test coverage for expanded input handling and normalization behavior.

Some OpenAI-compatible structured-output backends occasionally emit `explicit` as a bare string or a list of strings instead of the declared list of `{"content": str}` objects. Today this triggers Pydantic validation failure in the deriver path and the entire batch falls through to the empty-fallback representation, dropping otherwise usable observations. Extend the existing before-validator on `PromptRepresentation.explicit` to normalize: - `"User works remotely"` -> `[{"content": "User works remotely"}]` - `["A", "B"]` -> `[{"content": "A"}, {"content": "B"}]` - `["A", {"content": "B"}]` -> mixed strings and dicts coexist - blank / whitespace-only strings are dropped - `None` -> `[]` (preserved) Anything else is passed through unchanged so Pydantic still raises its normal validation error for genuinely malformed input. Adds seven unit tests covering each variant plus a passthrough case for already well-formed dict input. Closes plastic-labs#524 By Gamaliel — coworking with Claude Code

coderabbitai · 2026-04-09T21:10:06Z

Walkthrough

The PromptRepresentation.explicit field validator was renamed and enhanced to normalize string-valued observations from OpenAI-compatible providers. Input coercion now handles bare strings, lists of strings, and mixed structures, converting them to the standard schema while filtering blank entries and preserving non-string items.

Changes

Cohort / File(s)	Summary
Explicit Field Normalization `src/utils/representation.py`	Renamed `convert_none_to_empty_list` validator to `normalize_explicit` and extended logic to coerce strings into `{"content": ...}` objects, handle list normalization with null/blank filtering, and preserve non-string items for standard validation.
Normalization Test Coverage `tests/deriver/test_representation_crud.py`	Added comprehensive tests verifying the new `PromptRepresentation.explicit` normalization: bare strings, string lists, mixed lists, whitespace filtering, and preservation of existing behavior (`None` → `[]`, dicts pass through).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A string hops into our schema's door,
We coerce it gently—no rejection more!
Lists of words now dance in harmony,
Blank strings fade like mist on the spree.
OpenAI's quirks are tolerated with cheer! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: normalizing string-shaped inputs for PromptRepresentation.explicit field to the expected list format.
Linked Issues check	✅ Passed	All coding requirements from issue `#524` are met: string to list coercion, list-of-strings normalization, blank string filtering, None preservation, and mixed list support with comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to normalizing the explicit field validator; no unrelated modifications to vector dimensions, embedding config, queue flushing, token accounting, or other excluded concerns.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/deriver/test_representation_crud.py (1)

111-169: Add one negative test to lock malformed-input behavior.

These tests cover accepted shapes well, but there isn’t a regression test confirming unsupported shapes still raise validation errors (e.g., {"explicit": 123}).

✅ Suggested test addition

 import datetime
 
+import pytest
+from pydantic import ValidationError
+
 from src.utils.representation import (
     DeductiveObservation,
     ExplicitObservation,
     ExplicitObservationBase,
@@
 def test_prompt_representation_proper_dicts_explicit_pass_through():
     """Existing behavior: a well-formed dict list still validates unchanged."""
     pr = PromptRepresentation.model_validate(
         {"explicit": [{"content": "A"}, {"content": "B"}]}
     )
     assert [e.content for e in pr.explicit] == ["A", "B"]
+
+
+def test_prompt_representation_invalid_explicit_shape_raises():
+    """Unsupported top-level shapes should still fail normal validation."""
+    with pytest.raises(ValidationError):
+        PromptRepresentation.model_validate({"explicit": 123})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/deriver/test_representation_crud.py` around lines 111 - 169, Add a
negative test that ensures invalid shapes for the explicit field raise
validation errors: create a new test (e.g.,
test_prompt_representation_rejects_invalid_type_explicit) that calls
PromptRepresentation.model_validate({"explicit": 123}) and asserts that a
validation exception is raised (use pytest.raises with the appropriate
ValidationError type used in the project). Place the test alongside the other
test_* functions in tests/deriver/test_representation_crud.py so malformed-input
behavior (unsupported types like integers) is locked in.

src/utils/representation.py (1)

124-142: Consider converting this validator docstring to Google style sections.

The content is clear, but it does not follow the repository’s preferred Google docstring structure.

As per coding guidelines, "Use Google style docstrings".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/representation.py` around lines 124 - 142, Update the validator
docstring in src/utils/representation.py (the validator that "Normalize loose
shapes emitted by OpenAI-compatible providers") to use Google style sections:
add a short summary line, then separate "Args:", "Returns:" (or "Yields:" if
appropriate), and "Raises:" as needed, and convert the bullet examples into a
"Examples:" or an "Examples" subsection under the description; preserve all
behavioral details (None -> [], string -> list of dicts, list coercions,
dropping blank strings, and pass-through behavior) and keep the explanatory note
about when non-str/list types are left for Pydantic to validate.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/representation.py`:
- Around line 151-152: The code currently swallows None entries by doing "if
item is None: continue" when building the explicit list; instead, treat None as
a schema validation error: locate the block containing the "if item is None:
continue" check (working with the explicit list construction) and replace the
silent continue with an explicit raise (e.g., ValueError or a custom
ValidationError) that includes context about the offending item and the list
name ("explicit"); ensure the function that constructs/validates "explicit"
propagates this error so malformed payloads fail fast.

---

Nitpick comments:
In `@src/utils/representation.py`:
- Around line 124-142: Update the validator docstring in
src/utils/representation.py (the validator that "Normalize loose shapes emitted
by OpenAI-compatible providers") to use Google style sections: add a short
summary line, then separate "Args:", "Returns:" (or "Yields:" if appropriate),
and "Raises:" as needed, and convert the bullet examples into a "Examples:" or
an "Examples" subsection under the description; preserve all behavioral details
(None -> [], string -> list of dicts, list coercions, dropping blank strings,
and pass-through behavior) and keep the explanatory note about when non-str/list
types are left for Pydantic to validate.

In `@tests/deriver/test_representation_crud.py`:
- Around line 111-169: Add a negative test that ensures invalid shapes for the
explicit field raise validation errors: create a new test (e.g.,
test_prompt_representation_rejects_invalid_type_explicit) that calls
PromptRepresentation.model_validate({"explicit": 123}) and asserts that a
validation exception is raised (use pytest.raises with the appropriate
ValidationError type used in the project). Place the test alongside the other
test_* functions in tests/deriver/test_representation_crud.py so malformed-input
behavior (unsupported types like integers) is locked in.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4ca9c7d1-c510-4813-9a2e-070a54ff1a99

📥 Commits

Reviewing files that changed from the base of the PR and between 5b6bd59 and 0e52a22.

📒 Files selected for processing (2)

src/utils/representation.py
tests/deriver/test_representation_crud.py

coderabbitai · 2026-04-09T21:13:48Z

+                if item is None:
+                    continue


⚠️ Potential issue | 🟡 Minor

Don’t silently drop None entries inside explicit lists.

At Line 151/Line 152, None items are skipped, which broadens tolerance and can hide malformed payloads that should fail normal schema validation.

🔧 Proposed fix

if isinstance(v, list): normalized: list[Any] = [] for item in cast(list[Any], v): - if item is None: - continue if isinstance(item, str): stripped = item.strip() if stripped: normalized.append({"content": stripped}) continue # dicts, ExplicitObservationBase instances, etc. pass through # to normal Pydantic validation. normalized.append(item) return normalized

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/representation.py` around lines 151 - 152, The code currently swallows None entries by doing "if item is None: continue" when building the explicit list; instead, treat None as a schema validation error: locate the block containing the "if item is None: continue" check (working with the explicit list construction) and replace the silent continue with an explicit raise (e.g., ValueError or a custom ValidationError) that includes context about the offending item and the list name ("explicit"); ensure the function that constructs/validates "explicit" propagates this error so malformed payloads fail fast.

AmilGael · 2026-04-09T21:40:30Z

@ajspig, I forgot to tag you, apologies - Gamaliel from Pursuit

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deriver): normalize PromptRepresentation.explicit string shapes#538

fix(deriver): normalize PromptRepresentation.explicit string shapes#538
AmilGael wants to merge 1 commit intoplastic-labs:mainfrom
AmilGael:fix/prompt-representation-string-explicit

AmilGael commented Apr 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 9, 2026

Uh oh!

AmilGael commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AmilGael commented Apr 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this, why now

What changed

Out of scope (intentionally)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

AmilGael commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AmilGael commented Apr 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading