A macOS command-line tool that reads text from images and PDFs, and creates searchable PDFs.
Runs entirely on your Mac with Apple's Vision framework; nothing is uploaded.
Tip
Useful for AI agents too: instead of spending vision tokens reading documents, an agent can run mac-ocr locally for free. A skill is bundled so agents know how to use it.
- Read text from an image:
mac-ocr photo.png - Read text from many images:
mac-ocr *.png - Stream text from a PDF, page by page:
mac-ocr scan.pdf --format jsonl - Turn an image into a searchable PDF:
mac-ocr searchable-pdf photo.png→photo.ocr.pdf - Add a selectable text layer to a scanned PDF:
mac-ocr searchable-pdf scan.pdf→scan.ocr.pdf
npm install -g mac-ocrOr run it without installing:
npx mac-ocr receipt.jpgRequirements: macOS 10.15+. The npm package ships a prebuilt universal binary, so no Xcode or Swift toolchain is needed.
OCR is the default action — you don't need a subcommand:
mac-ocr receipt.jpg # text → stdout
mac-ocr page1.png page2.png # multiple images
mac-ocr scan.pdf # multi-page PDF
cat screenshot.png | mac-ocr # stdin
mac-ocr https://example.com/a.png # URL (simple GET)Default output is plain text. Use JSON when you need bounding boxes, confidence, or page metadata:
mac-ocr receipt.jpg --format json
mac-ocr document.pdf --format jsonl # one JSON object per page, streamedPDF pages stream as they're recognized, so with a large document you see the first page's text right away.
mac-ocr ~/Screenshots/*.png -o '[dir]/[name].txt' # a .txt next to each image
mac-ocr scan.pdf -o notes.md # recognized text to a chosen .txt/.md file
mac-ocr receipts/*.pdf -o out/ # one file per input in out/
grep -rli "invoice" ~/Screenshots # then search with normal tools-o takes a file, a directory (out/), or a filename template (all placeholders). Quote templates, since […] is a glob pattern in zsh. Whatever the extension, the content is the plain recognized text.
searchable-pdf takes a PDF or an image and writes a PDF that looks identical to the source but whose text is selectable and searchable. By default it writes [name].ocr.pdf next to each input — one searchable PDF per input (inputs are never merged):
mac-ocr searchable-pdf scan.pdf # writes scan.ocr.pdf
mac-ocr searchable-pdf photo.jpg # image → one-page photo.ocr.pdf
mac-ocr searchable-pdf *.pdf # writes <name>.ocr.pdf for eachUse -o to control the destination — a directory, a [name] template, a fixed file, or - for stdout:
mac-ocr searchable-pdf scan.pdf -o out/ # out/scan.ocr.pdf
mac-ocr searchable-pdf scan.pdf -o '[name]-ocr.pdf' # scan-ocr.pdf
mac-ocr searchable-pdf scan.pdf -o searchable.pdf # fixed path
mac-ocr searchable-pdf scan.pdf -o - > scan.pdf # stdoutA fixed path or - (stdout) takes a single input; for multiple inputs use a directory or a [name] template.
Pages that already have selectable text are skipped — only scanned pages get OCR. A PDF that needs no OCR at all passes through unchanged. To OCR every page regardless, pass --ocr-all-pages. The finer points (what survives a rewrite, how "already has text" is decided) are in docs/CLI.md.
In an interactive terminal you get a live [page/total] progress counter. Piped or redirected runs are silent on success, so scripts stay clean.
Both OCR and searchable-pdf accept the recognition options:
| Flag | Effect |
|---|---|
--fast |
Faster, lower-accuracy recognition (details) |
--password <password> |
Password for an encrypted PDF (or set MAC_OCR_PDF_PASSWORD) |
-l, --language <code> |
Recognition language (BCP-47, repeatable). e.g. -l en-US -l ja-JP |
-c, --confidence <0–1> |
Drop observations below this confidence |
-w, --custom-words <word> |
Add custom vocabulary (repeatable) |
--custom-words-file <path> |
Custom vocabulary file, one word per line |
--no-language-correction |
Disable language correction |
--min-text-height <0–1> |
Ignore text shorter than this fraction of image height |
--pdf-dpi <auto|72–600> |
PDF rasterization DPI (default auto) |
--roi <x,y,w,h> |
Region of interest: restrict recognition to a normalized region (top-left origin) |
| Flag | Effect |
|---|---|
-f, --format <text|json|jsonl> |
Output format (default text) |
-o, --output <path> |
Output path, directory, or template ([name], [ext], [dir], [page]). Default: stdout. Any extension — e.g. .txt or .md. |
--max-candidates <1–10> |
Alternative text candidates per observation |
| Flag | Effect |
|---|---|
-o, --output <dest> |
Output path, [name] template, directory, or - for stdout. Default: [name].ocr.pdf next to each input. |
--ocr-all-pages |
OCR every page, including pages that already have selectable text (skipped by default) |
List the recognition languages available on your macOS version with mac-ocr languages (add --fast for the fast recognizer's set).
See docs/CLI.md for the full reference — every command and flag, plus the JSON output schema.
The same package exposes a typed, promise-based API that wraps the binary. Inputs are image or PDF bytes — read files or fetch URLs in your own code and pass the bytes:
npm install mac-ocrimport fs from 'node:fs/promises'
import { ocr, createSearchablePdf, supportedLanguages } from 'mac-ocr'
// Recognize text in an image or single-page PDF
const result = await ocr(await fs.readFile('receipt.jpg'))
console.log(result.text)
for (const { text, confidence, boundingBox } of result.observations) { /* … */ }
// Multi-page PDF: stream pages as they finish…
for await (const page of ocr.pages(await fs.readFile('book.pdf'))) {
console.log(page.page, '/', page.pageCount, page.text)
}
// …or collect the whole thing into an array
const pages = await Array.fromAsync(ocr.pages(await fs.readFile('book.pdf')))
// Build a searchable PDF (returns the PDF bytes)
const pdf = await createSearchablePdf(await fs.readFile('scan.pdf'), { fast: true })
await fs.writeFile('scan.ocr.pdf', pdf)
// Recognition languages supported on this macOS version (for ocr and createSearchablePdf)
const languages = await supportedLanguages()Options mirror the CLI flags (like { fast: true } above), plus an AbortSignal for cancellation. Failures throw a MacOcrError with a kind you can branch on. See docs/NODE.md for every option, the result types, and error handling.
mac-ocr is a native Swift binary built on Apple's Vision framework (VNRecognizeTextRequest). Recognition happens entirely on-device — nothing is uploaded. The searchable-PDF layer is invisible text drawn with Core Graphics + Core Text, placed word by word where Vision found each word.
The package bundles an agent skill covering the CLI and Node API — set up skills-npm in your project and coding agents discover it automatically.
