What this shows: AI document OCR and table extraction platform for turning PDFs, scans, and images into structured data.
My role / team role: Defined the document-intelligence workflow, extraction UX, field configuration model, technical documentation, and demo-facing case study.
Public proof: Screenshots show document upload and extraction field configuration, with README coverage of OCR, tables, and field matching.
Tech and implementation areas:
- Python/FastAPI
- OCR/vision pipeline
- React/Vite UI
- Structured JSON output
- Spreadsheet-ready exports
Relevant roles this project supports:
- Document AI Developer
- OCR Engineer
- Full-Stack AI Engineer
- Data Extraction Automation Developer
This is a public case-study repository. The production source code is private because it may contain proprietary business logic, client workflows, credentials, deployment details, or reusable internal implementation patterns. The public repo is intentionally focused on the product, screenshots, workflow, architecture, and evaluation material.
For technical review, we can provide a live demo walkthrough, private repository access under NDA, a code screen-share, architecture review, or redacted implementation samples.
Turn any PDF, scan, or image into clean, structured data — text and tables — in seconds.
Doctern is an AI-powered document intelligence tool that extracts text, structured tables, and key fields from PDFs and images with high accuracy. Built for teams that are tired of manual data entry.
A look at the Doctern document intelligence interface.
| Document Upload & Extraction | Field Configuration |
|---|---|
![]() |
![]() |
- Document Upload & Extraction — drag and drop a PDF, scan, or image; Doctern runs OCR, table detection, and field matching automatically.
- Field Configuration — define the named fields (invoice number, totals, dates) Doctern should pull from every document.
Doctern is an AI document OCR and table extraction platform. You upload a PDF or image, and Doctern returns the text, the tables (with rows and columns preserved), and the specific fields you care about — ready to copy into Excel, Google Sheets, or your own systems.
It is designed for invoice processing, form digitization, financial document extraction, and automated data entry — anywhere a human would otherwise retype information from a document.
- Accurate OCR — extracts text from scanned documents, photos, and PDFs, including low-quality scans.
- Structured table extraction — detects table boundaries and preserves row-and-column relationships, not just loose text.
- Intelligent field matching — finds the specific fields you need (invoice number, totals, dates, names) even when layouts vary.
- Borderless table support — reconstructs tables that have no visible grid lines.
- Copy-paste ready output — generates clean HTML/structured tables that drop straight into spreadsheets.
- Fast, modern interface — upload, preview, and export in a clean web app.
| Audience | Why Doctern helps |
|---|---|
| Accounting & finance teams | Stop retyping invoices, receipts, and statements. |
| Operations teams | Digitize forms, contracts, and paperwork at scale. |
| Data teams | Get clean, structured input instead of messy PDFs. |
| Small businesses | Automate data entry without hiring for it. |
| Developers & integrators | A reliable document-extraction layer for your product. |
- Upload a PDF, scan, or image.
- Doctern processes the document — OCR, table detection, and field matching run automatically.
- Review the extracted text, tables, and fields in a clean preview.
- Export structured data ready for spreadsheets or downstream systems.
No manual templating. No retyping.
Doctern is built on a modern, production-grade stack:
- Backend: Python, FastAPI
- OCR & vision: deep-learning OCR, computer-vision table detection
- Frontend: React, Vite
- Output: structured JSON and spreadsheet-ready tables
This repository is a public showcase. It documents the product — it does not contain the proprietary extraction engine or source code.
What file types does Doctern support? PDFs, scanned documents, and common image formats (PNG, JPG).
Can it handle tables without borders? Yes. Doctern reconstructs borderless tables by analyzing the spatial layout of the text.
Is Doctern accurate on low-quality scans? Doctern uses deep-learning OCR that performs well on noisy, skewed, and low-resolution scans.
Can I extract only specific fields? Yes. Doctern's field-matching finds named fields (like "Invoice Total" or "Date") even when document layouts differ.
Is this open source? No. This repository is a marketing showcase. The product is proprietary — see the license below.
How do I get access or a demo? See the Contact section below.
- ⚡ Speed — seconds per document instead of minutes of manual entry.
- 🎯 Accuracy — preserves table structure, not just raw text.
- 🧩 Flexible — adapts to varying layouts without per-template setup.
- 🔒 Private — your documents, your data.
Interested in using Doctern, integrating it, or seeing a live demo? Get in touch with the development team.
| Developer | ||
|---|---|---|
| Muhammad Maaz | mazwaseem098@gmail.com | +92 323 7609712 |
| Muhammad Tanveer | mtanveertahir66@gmail.com | +92 320 6688665 |
- Company: Advenno
- GitHub: @maaz-gobi
We work with businesses to automate document processing — get in touch to discuss your use case.
This is a proprietary product. This repository contains documentation and marketing materials only. See LICENSE for terms.
Keywords: document OCR, AI table extraction, PDF data extraction, invoice OCR, automated data entry, document intelligence, scanned document extraction, PDF to Excel, form digitization, structured data extraction, FastAPI OCR, document parsing AI.

