Agent skill for Claude Code and Cursor that extracts structured fields from document images (invoices, receipts, business cards, IDs, forms) via the space ocr REST API. Each extracted value carries the bounding box on the page it was read from, and results are stored in a queryable workspace behind the API instead of returned to the conversation as raw JSON.
| You ask | What the skill runs |
|---|---|
| "Extract the vendor and total from this invoice." | ocr <image> --auto |
| "Process this folder of 30 receipts into a sheet." | create sheet → upload (server-side OCR, async) |
| "Which vendor billed the most last month?" | query <sheet> --where 'invoice_date>=2026-05-01' --sort total:desc over the stored rows |
| "Show me where on the page that number came from." | view <sheet> returns each value's bounding box; deep-link https://space-ocr.com/pages/myspace/<path> |
| "Fix row 4, column 'total' — it should be 12800." | edit <sheet> --row 4 --column total --value 12800 |
Asking an LLM to OCR each image directly tends to break down at scale:
- Slow on batches. Every image becomes its own round-trip through the chat. Thirty receipts means thirty tool calls and a lot of waiting.
- Hallucinations. The model emits text that isn't on the page — totals it computed, dates it normalized, vendors it guessed from context.
- Context bloat and drift. Raw OCR JSON for every document piles up in the conversation, pushes earlier turns out of the window, and the agent gradually loses track of what it already processed. By the time someone asks "which vendor billed the most?", half the rows are no longer in context.
The skill works around each of those:
- OCR runs server-side and async via
/upload— one call submits the whole batch, the agent polls jobs instead of routing every image through the conversation. - Every value carries a bounding box re-anchored to real Vision-API symbols on the
page, not an LLM guess. Anything the model invents has no anchor and surfaces as
such;
--autofurther refuses non-document images instead of inventing fields. - Extractions live in sheets behind the API, not in the conversation. The agent
reads back only what the current question needs via
query/viewwith server-side filters, so the chat doesn't accumulate raw OCR and the agent doesn't drift as the batch grows.
Python client is stdlib-only — no pip install, no MCP server, no SDK.
Claude Code (loads on demand):
/plugin marketplace add oisidonut/claude-space-ocr-skill
/plugin install space-ocr@space-ocr
Cursor — installs to ~/.cursor/skills/space-ocr/:
# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/oisidonut/claude-space-ocr-skill/main/scripts/install.sh | bash -s -- cursor# Windows (interactive)
iwr -useb https://raw.githubusercontent.com/oisidonut/claude-space-ocr-skill/main/scripts/install.ps1 | iex
# Windows (unattended)
& ([scriptblock]::Create((iwr -useb https://raw.githubusercontent.com/oisidonut/claude-space-ocr-skill/main/scripts/install.ps1).Content)) -Target cursorFor a project-scoped install (committed alongside the codebase), copy
skills/space-ocr/ into <your-project>/.cursor/skills/.
- Create an API key at https://space-ocr.com → Developer → API Keys (top menu).
- Either drop it in a
.envnext to the script (see.env.example) or exportSPACE_OCR_API_KEY=spocr_…in your shell. - Smoke test:
python3 <install-root>/space-ocr/scripts/space_ocr.py balance(Windows: usepyinstead ofpython3.)
Python 3.8+ on the host — the client runs locally, not inside the plugin sandbox, and uses
only the standard library. Override the endpoint with SPACE_OCR_API_BASE if you self-host.
SKILL.md encodes four rules the agent follows on demand:
- Store, don't dump — default to
create sheet→uploadso extractions stay citeable. - Check before you scan —
balancefirst; reuse existing rows instead of re-OCRing. - Answer from stored rows —
querywith server-side filters; reads don't cost a scan. - Cite the location; flag what's uncertain — every value carries a
field_bboxes.
.claude-plugin/ # marketplace catalog + plugin manifest
scripts/ # install.sh / install.ps1
skills/space-ocr/
SKILL.md # behaviour rules + command table (loaded into agent context)
scripts/space_ocr.py # API client (stdlib only)
references/api.md # full endpoint spec — loaded on demand
assets/ # example field schemas (invoice / receipt / business_card)
.env.example