Skip to content

fix(cloudflare): correct browser-rendering getCrawl response schema#331

Open
k3dom wants to merge 1 commit into
alchemy-run:mainfrom
k3dom:fix/browser-rendering-getcrawl-schema
Open

fix(cloudflare): correct browser-rendering getCrawl response schema#331
k3dom wants to merge 1 commit into
alchemy-run:mainfrom
k3dom:fix/browser-rendering-getcrawl-schema

Conversation

@k3dom

@k3dom k3dom commented Jun 10, 2026

Copy link
Copy Markdown

Disclaimer

This PR was fully generated using Claude Code. I however have been running these exact schema fixes as a pnpm patch against @distilled.cloud/cloudflare as crawls and can confirm they match the live API behavior.

Problem

The upstream cloudflare-typescript SDK types for browser-rendering's getCrawl operation don't match what the API actually returns, so decoding real responses fails:

  1. records[].metadata is typed as required, but Cloudflare omits it for records that have not completed yet (queued/skipped/cancelled/...). Polling an in-progress crawl job therefore fails to decode with a ParseError until every record has finished.
  2. cursor is typed as string, but the API returns a number (the next record index for pagination). It is absent on the last page.

Example of a real response that the current schema rejects:

{
  "result": {
    "id": "...",
    "browserSecondsUsed": 12,
    "finished": 3,
    "records": [
      { "status": "queued", "url": "https://example.com/page" }
    ],
    "skipped": 0,
    "status": "running",
    "total": 10,
    "cursor": 50
  }
}

Fix

Adds patches/browser-rendering/getCrawl.json using the existing patch mechanism:

{
  "response": {
    "properties": {
      "records[].metadata": { "optional": true },
      "cursor": { "type": "number", "nullable": true }
    }
  }
}

and regenerates the service (bun run generate). Only the getCrawl schema changes; the emitted specs/cloudflare/browser-rendering.openapi.yml is updated to match. Unrelated drift the generator produced in other emitted specs (containers/r2/secrets-store/workers info titles) was intentionally left out of this PR.

bun run check (tsgo + oxlint + oxfmt) passes.

Non-issues

Two other discrepancies between the live API and the SDK types were verified to not need patching:

  • Extra metadata keys (og:*, lastModified, ...) — the API returns more metadata fields than the SDK declares, but effect Schema's default decode ignores unknown keys, so they pass through without error.
  • Undocumented job statuses (running, cancelled_by_user) — the top-level job status is a plain Schema.String (not a literal union), so these decode fine as-is.

The upstream Cloudflare TypeScript SDK types for the getCrawl operation
do not match what the API actually returns:

- `records[].metadata` is typed as required, but Cloudflare omits it
  for records that have not completed (queued/skipped/cancelled/...),
  causing decode failures while polling an in-progress crawl.
- `cursor` is typed as a string, but the API returns a number (the
  next record index). It is absent on the last page.

Adds a patch file for the operation and regenerates the service.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@k3dom k3dom marked this pull request as ready for review June 10, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant