Skip to content

privatenumber/mac-ocr

Repository files navigation

mac-ocr logo

mac-ocr

A macOS command-line tool that reads text from images and PDFs, and creates searchable PDFs.
Runs entirely on your Mac with Apple's Vision framework; nothing is uploaded.

Tip

Useful for AI agents too: instead of spending vision tokens reading documents, an agent can run mac-ocr locally for free. A skill is bundled so agents know how to use it.

Features

  • Read text from an image: mac-ocr photo.png
  • Read text from many images: mac-ocr *.png
  • Stream text from a PDF, page by page: mac-ocr scan.pdf --format jsonl
  • Turn an image into a searchable PDF: mac-ocr searchable-pdf photo.pngphoto.ocr.pdf
  • Add a selectable text layer to a scanned PDF: mac-ocr searchable-pdf scan.pdfscan.ocr.pdf

Install

npm install -g mac-ocr

Or run it without installing:

npx mac-ocr receipt.jpg

Requirements: macOS 10.15+. The npm package ships a prebuilt universal binary, so no Xcode or Swift toolchain is needed.

Recognize text

OCR is the default action — you don't need a subcommand:

mac-ocr receipt.jpg                 # text → stdout
mac-ocr page1.png page2.png         # multiple images
mac-ocr scan.pdf                    # multi-page PDF
cat screenshot.png | mac-ocr        # stdin
mac-ocr https://example.com/a.png   # URL (simple GET)

Default output is plain text. Use JSON when you need bounding boxes, confidence, or page metadata:

mac-ocr receipt.jpg --format json
mac-ocr document.pdf --format jsonl   # one JSON object per page, streamed

PDF pages stream as they're recognized, so with a large document you see the first page's text right away.

Save text to files

mac-ocr ~/Screenshots/*.png -o '[dir]/[name].txt'   # a .txt next to each image
mac-ocr scan.pdf -o notes.md                        # recognized text to a chosen .txt/.md file
mac-ocr receipts/*.pdf -o out/                      # one file per input in out/
grep -rli "invoice" ~/Screenshots                    # then search with normal tools

-o takes a file, a directory (out/), or a filename template (all placeholders). Quote templates, since […] is a glob pattern in zsh. Whatever the extension, the content is the plain recognized text.

Create a searchable PDF

searchable-pdf takes a PDF or an image and writes a PDF that looks identical to the source but whose text is selectable and searchable. By default it writes [name].ocr.pdf next to each input — one searchable PDF per input (inputs are never merged):

mac-ocr searchable-pdf scan.pdf            # writes scan.ocr.pdf
mac-ocr searchable-pdf photo.jpg            # image → one-page photo.ocr.pdf
mac-ocr searchable-pdf *.pdf                # writes <name>.ocr.pdf for each

Use -o to control the destination — a directory, a [name] template, a fixed file, or - for stdout:

mac-ocr searchable-pdf scan.pdf -o out/              # out/scan.ocr.pdf
mac-ocr searchable-pdf scan.pdf -o '[name]-ocr.pdf'  # scan-ocr.pdf
mac-ocr searchable-pdf scan.pdf -o searchable.pdf    # fixed path
mac-ocr searchable-pdf scan.pdf -o - > scan.pdf      # stdout

A fixed path or - (stdout) takes a single input; for multiple inputs use a directory or a [name] template.

Pages that already have selectable text are skipped — only scanned pages get OCR. A PDF that needs no OCR at all passes through unchanged. To OCR every page regardless, pass --ocr-all-pages. The finer points (what survives a rewrite, how "already has text" is decided) are in docs/CLI.md.

In an interactive terminal you get a live [page/total] progress counter. Piped or redirected runs are silent on success, so scripts stay clean.

Options

Both OCR and searchable-pdf accept the recognition options:

Flag Effect
--fast Faster, lower-accuracy recognition (details)
--password <password> Password for an encrypted PDF (or set MAC_OCR_PDF_PASSWORD)
-l, --language <code> Recognition language (BCP-47, repeatable). e.g. -l en-US -l ja-JP
-c, --confidence <0–1> Drop observations below this confidence
-w, --custom-words <word> Add custom vocabulary (repeatable)
--custom-words-file <path> Custom vocabulary file, one word per line
--no-language-correction Disable language correction
--min-text-height <0–1> Ignore text shorter than this fraction of image height
--pdf-dpi <auto|72–600> PDF rasterization DPI (default auto)
--roi <x,y,w,h> Region of interest: restrict recognition to a normalized region (top-left origin)

mac-ocr <file>

Flag Effect
-f, --format <text|json|jsonl> Output format (default text)
-o, --output <path> Output path, directory, or template ([name], [ext], [dir], [page]). Default: stdout. Any extension — e.g. .txt or .md.
--max-candidates <1–10> Alternative text candidates per observation

mac-ocr searchable-pdf <file>

Flag Effect
-o, --output <dest> Output path, [name] template, directory, or - for stdout. Default: [name].ocr.pdf next to each input.
--ocr-all-pages OCR every page, including pages that already have selectable text (skipped by default)

List the recognition languages available on your macOS version with mac-ocr languages (add --fast for the fast recognizer's set).

See docs/CLI.md for the full reference — every command and flag, plus the JSON output schema.

Node.js API

The same package exposes a typed, promise-based API that wraps the binary. Inputs are image or PDF bytes — read files or fetch URLs in your own code and pass the bytes:

npm install mac-ocr
import fs from 'node:fs/promises'
import { ocr, createSearchablePdf, supportedLanguages } from 'mac-ocr'

// Recognize text in an image or single-page PDF
const result = await ocr(await fs.readFile('receipt.jpg'))
console.log(result.text)
for (const { text, confidence, boundingBox } of result.observations) { /* … */ }

// Multi-page PDF: stream pages as they finish…
for await (const page of ocr.pages(await fs.readFile('book.pdf'))) {
    console.log(page.page, '/', page.pageCount, page.text)
}
// …or collect the whole thing into an array
const pages = await Array.fromAsync(ocr.pages(await fs.readFile('book.pdf')))

// Build a searchable PDF (returns the PDF bytes)
const pdf = await createSearchablePdf(await fs.readFile('scan.pdf'), { fast: true })
await fs.writeFile('scan.ocr.pdf', pdf)

// Recognition languages supported on this macOS version (for ocr and createSearchablePdf)
const languages = await supportedLanguages()

Options mirror the CLI flags (like { fast: true } above), plus an AbortSignal for cancellation. Failures throw a MacOcrError with a kind you can branch on. See docs/NODE.md for every option, the result types, and error handling.

How it works

mac-ocr is a native Swift binary built on Apple's Vision framework (VNRecognizeTextRequest). Recognition happens entirely on-device — nothing is uploaded. The searchable-PDF layer is invisible text drawn with Core Graphics + Core Text, placed word by word where Vision found each word.

Agent Skills

The package bundles an agent skill covering the CLI and Node API — set up skills-npm in your project and coding agents discover it automatically.