Skip to content

R0mb0/Rapid_OCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ“„ Rapid-OCR πŸ”

Codacy Badge pages-build-deployment Maintenance Open Source Love svg3 MIT Donate

A lightning-fast, privacy-first web app for offline text extraction. Paste (Ctrl+V) or drop any image to instantly generate plain text and a searchable PDF entirely within your browser using Tesseract.js. No server uploads required. Fast, secure, and ready to use! πŸš€

01.png 02.png 03.png


πŸš€ Features

  • Instant Ctrl+V Support: No need to click through clunky upload menus. Just copy an image to your clipboard and paste it directly into the app to start the extraction.
  • Searchable PDF Generation: Doesn't just give you raw text! It magically overlays invisible text boxes over the original image, generating a fully searchable and selectable PDF on the fly.
  • Multi-Language Support: Automatically recognizes and processes multiple European languages simultaneously (English, Italian, French, Spanish, German), handling special characters flawlessly.
  • 100% Client-Side & Private: Built entirely with HTML, CSS, and Vanilla JavaScript. Your images and documents are processed locally in your browser. Nothing is ever uploaded to a cloud server.
  • Offline Capable: Run the app entirely without an internet connection using local assets and language packs.

πŸ› οΈ How it works

  1. Initialization: Upon loading or pasting an image, the app initializes a local Tesseract.js Web Worker, loading the necessary language data packs into the browser's memory.
  2. Processing: The image data is passed to the Web Assembly (WASM) core of Tesseract, which scans the pixels, recognizes characters, and calculates the layout.
  3. Concurrent Output: Tesseract simultaneously returns both the extracted raw text and the ArrayBuffer data for the PDF document.
  4. Blob Conversion: The app instantly converts the PDF data into a Blob, creating a downloadable file directly from your browser memory, while displaying the text on-screen concurrently.

πŸ† What makes it special?

  • Zero-Setup Friction: Designed to be an immediate utility tool. You don't need to log in, select languages from dropdowns, or wait in server queues. Paste and go.
  • Modern UI/UX: Clean, responsive design powered by Tailwind CSS that keeps you informed of the OCR progress every step of the way without visual clutter.

πŸ’‘ Why use this project?

  • For Sensitive Documents: The perfect solution when you need to extract text from private documents, bank statements, or confidential notes, and you don't trust free online OCR services that might store and harvest your files.

⚑ Getting Started

Online

Simply open the Live Demo link on any browser, paste an image, and grab your text!

Testing Locally & Offline (Python Web Server)

If you have downloaded the repository to use it completely offline, you cannot simply double-click the index.html file. Because the app uses advanced Web Workers to process images, modern browsers block them from running via the standard file:/// protocol due to CORS security restrictions.

To run it locally, you need to spin up a quick local web server. The easiest way is using Python.

1. Install Python (if you don't have it)

  • Windows: Open PowerShell as Administrator and run:
    choco install python
    (Requires Chocolatey)
  • macOS: Open Terminal and run:
    brew install python
    (Requires Homebrew)
  • Linux (Debian/Ubuntu): Open Terminal and run:
    sudo apt update && sudo apt install python3

2. Launch the Local Server

  1. Open your terminal or command prompt.
  2. Navigate to the folder where you extracted this project (e.g., cd path/to/RapidOCR).
  3. Run the following command:
    • On Windows: python -m http.server 8000
    • On Mac/Linux: python3 -m http.server 8000
  4. Open your browser and type http://localhost:8000 in the address bar. The app is now running securely and fully offline!

⚠️ Troubleshooting: Language Loading

If the OCR seems stuck at "Loading..." during your very first run on the live version, ensure your internet connection is active so the browser can cache the Tesseract language data files. Once cached, subsequent runs will be instantaneous. If you are running the offline local server version, ensure your tessdata folder contains all the downloaded .traineddata.gz files.


Crafted with AI

About

A lightning-fast, privacy-first web app for offline text extraction. Paste (Ctrl+V) or drop any image to instantly generate plain text and a searchable PDF entirely within your browser using Tesseract.js. No server uploads required.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Contributors

Languages