由夸克扫描王提供的通用 OCR 技能,支持从图片、截图、照片或扫描文档中提取、识别和结构化文本。
本技能是 QoderWork Agent Skill,可在 QoderWork 桌面端安装并通过自然语言调用。
📦 ClawHub 主页:yescan-ocr-universal
- 通用文档识别 — 印刷体、手写体混排内容识别
- 表格提取 — 复杂表格、合并单元格、跨页表格还原为结构化数据
- 数学公式识别 — LaTeX 格式输出,支持行内/行间公式
- 证件识别 — 身份证、社保卡、驾照、行驶证、港澳台通行证、学位证等
- 票据识别 — 增值税发票、火车票、英文发票(商业发票)等
- 医疗报告 — 化验单、体检报告、医疗报告单
- 商品图识别 — 商品信息、产品包装、标签
- 题目识别 — 考题、习题、手写作文
共 17 个场景,Agent 按下表顺序匹配,命中第一个即停止。
| # | 场景名称 | Scene 标识 | 关键词 |
|---|---|---|---|
| 1 | 手写文档识别 | handwritten-ocr |
手写、笔迹、潦草字迹、手写作文 |
| 2 | 表格识别 | table-ocr |
表格、Excel、报表、单据 |
| 3 | 身份证识别 | idcard-ocr |
身份证、居民身份证 |
| 4 | 社保卡识别 | social-security-card-ocr |
社保卡、医保卡 |
| 5 | 港澳通行证识别 | travel-permit-ocr |
港澳通行证、港澳台通行证 |
| 6 | 学位证识别 | degree-certificate-ocr |
学位证、学历证书 |
| 7 | 增值税发票识别 | vat-invoice-ocr |
增值税发票、发票 |
| 8 | 火车票识别 | train-ticket-ocr |
火车票、车票 |
| 9 | 公式识别 | formula-ocr |
数学公式、LaTeX、化学方程式 |
| 10 | 题目识别 | question-ocr |
题目、考题、习题 |
| 11 | 驾驶证识别 | driver-license-ocr |
驾驶证、驾照 |
| 12 | 行驶证识别 | vehicle-license-ocr |
行驶证、车辆 |
| 13 | 英文发票识别 | commercial-invoice-ocr |
英文发票、商业发票 |
| 14 | 医疗报告单识别 | medical-report-ocr |
化验单、体检报告、医疗报告 |
| 15 | 营业执照识别 | business-license-ocr |
营业执照、工商执照 |
| 16 | 商品图片识别 | product-image-ocr |
商品、产品、包装、标签 |
| 17 | 通用文字提取 | general-ocr |
提取文字、识别文字(兜底) |
本技能需要夸克扫描王开放平台的 API Key(AI Agent 接入类型)。
前往 scan.quark.cn 开发者后台:
- 注册 / 登录(手机号)
- 进入「业务中心」→「密钥管理」
- 创建 Key,选择 AI Agent 接入(Key ID 前缀为
AI_) - 复制 Key 值
将 Key 写入 ~/.yescan_env,技能每次执行会自动读取,无需重启会话:
Linux / macOS
echo 'SCAN_WEBSERVICE_KEY=<your_api_key_here>' > ~/.yescan_envWindows(PowerShell)
'SCAN_WEBSERVICE_KEY=<your_api_key_here>' | Out-File -FilePath $HOME\.yescan_env -Encoding utf8也可以使用 QoderWork 内置的
yescan-key-setup技能自动完成注册和配置。
安装本技能后,在 QoderWork 中直接用自然语言调用即可:
帮我识别这张图片里的文字
把这张发票识别成结构化的 JSON
识别这张手写的数学公式
这张火车票的出发时间和车次是什么
Agent 会根据意图自动匹配对应的 scene 并调用底层脚本。
python3 scripts/scan.py --scene "${SCENE}" --url "https://example.com/image.jpg"
python3 scripts/scan.py --scene "${SCENE}" --path "/local/path/to/image.png"
python3 scripts/scan.py --scene "${SCENE}" --base64 "<base64 编码的图片数据>"支持的输入方式:URL、本地文件路径、Base64 编码。
脚本执行成功后,标准输出返回 JSON 对象,包含识别文本、位置坐标、置信度等结构化字段。Agent 会自动解析并呈现给用户。
- 单张图片大小不超过 5 MB
- 支持的图片格式:
jpg / jpeg / png / gif / bmp / webp / tiff / wbmp - 每次调用处理单张图片,不支持批量
- 图片数据会发送至
scan-business.quark.cn进行处理,不会永久保存 - 临时文件保存至
/tmp,任务结束后自动清理
yescan-ocr-qoder/
├── SKILL.md # 技能定义(QoderWork Agent 读取)
├── scripts/
│ ├── scan.py # 主执行脚本(Python 3.9+)
│ └── common/ # 公共工具模块
├── references/
│ ├── scenarios-detail.md # 场景详细说明
│ ├── error-codes.md # 错误码对照表
│ └── resources.md # 官方链接和资源
├── evals/ # 评测用例
└── LICENSE # MIT License
- 夸克扫描王开放平台 — 密钥申请、API 文档
- ClawHub 技能页 — 安装、下载量、评分
- QoderWork — Agent 运行环境
- yescan-ai GitHub Org — 更多技能
A universal OCR skill powered by Quark Scan King. Extracts, recognizes, and structures text from images, screenshots, photos, and scanned documents.
This is a QoderWork Agent Skill — install it in the QoderWork desktop app and invoke it with natural language.
📦 ClawHub page: mozhihuidage/yescan-ocr-universal
- General Document Recognition — Printed and handwritten text, including mixed layouts
- Table Extraction — Complex tables with merged cells and cross-page tables, output as structured data
- Math Formula Recognition — LaTeX output, supporting inline and display formulas
- ID Card Recognition — National ID cards, social security cards, driver's licenses, vehicle registration, travel permits, degree certificates, and more
- Receipt & Invoice OCR — VAT invoices, train tickets, commercial invoices, etc.
- Medical Reports — Lab reports, physical examination reports, medical records
- Product Image OCR — Product information, packaging, labels
- Question Recognition — Exam questions, exercises, handwritten essays
17 scenes in total. The agent matches in the order listed below and stops at the first hit.
| # | Scene Name | Scene ID | Keywords |
|---|---|---|---|
| 1 | Handwritten OCR | handwritten-ocr |
Handwriting, cursive, essays |
| 2 | Table OCR | table-ocr |
Tables, Excel, reports, receipts |
| 3 | ID Card OCR | idcard-ocr |
National ID card |
| 4 | Social Security Card OCR | social-security-card-ocr |
Social security, medical insurance |
| 5 | Travel Permit OCR | travel-permit-ocr |
HK/Macau/TW travel permit |
| 6 | Degree Certificate OCR | degree-certificate-ocr |
Degree, diploma, academic certificate |
| 7 | VAT Invoice OCR | vat-invoice-ocr |
VAT invoice, tax invoice |
| 8 | Train Ticket OCR | train-ticket-ocr |
Train ticket, railway ticket |
| 9 | Formula OCR | formula-ocr |
Math formula, LaTeX, chemical equation |
| 10 | Question OCR | question-ocr |
Exam question, exercise, problem set |
| 11 | Driver's License OCR | driver-license-ocr |
Driver's license |
| 12 | Vehicle License OCR | vehicle-license-ocr |
Vehicle registration, license plate |
| 13 | Commercial Invoice OCR | commercial-invoice-ocr |
Commercial invoice, English invoice |
| 14 | Medical Report OCR | medical-report-ocr |
Lab report, health checkup, medical report |
| 15 | Business License OCR | business-license-ocr |
Business license, registration certificate |
| 16 | Product Image OCR | product-image-ocr |
Product, packaging, label |
| 17 | General Text Extraction | general-ocr |
Extract text, recognize text (fallback) |
This skill requires an API Key (AI Agent type) from the Quark Scan King Open Platform.
Visit the scan.quark.cn developer console:
- Register / log in (phone number)
- Go to Business Center → Key Management
- Create a key, select AI Agent (Key ID prefix:
AI_) - Copy the key value
Write the key to ~/.yescan_env. The skill reads it automatically on each run — no session restart required.
Linux / macOS
echo 'SCAN_WEBSERVICE_KEY=<your_api_key_here>' > ~/.yescan_envWindows (PowerShell)
'SCAN_WEBSERVICE_KEY=<your_api_key_here>' | Out-File -FilePath $HOME\.yescan_env -Encoding utf8Alternatively, use the built-in
yescan-key-setupskill in QoderWork for automated registration and configuration.
After installing the skill, invoke it with natural language in QoderWork:
Extract the text from this image
Recognize this invoice as structured JSON
Identify the handwritten math formula in this photo
What is the departure time and train number on this ticket?
The agent will automatically match the appropriate scene and invoke the underlying script.
python3 scripts/scan.py --scene "${SCENE}" --url "https://example.com/image.jpg"
python3 scripts/scan.py --scene "${SCENE}" --path "/local/path/to/image.png"
python3 scripts/scan.py --scene "${SCENE}" --base64 "<base64-encoded image data>"Supported input methods: URL, local file path, Base64 encoding.
On success, the script outputs a JSON object to stdout containing recognized text, bounding box coordinates, confidence scores, and other structured fields. The agent automatically parses and presents the result to the user.
- Maximum image size: 5 MB
- Supported formats:
jpg / jpeg / png / gif / bmp / webp / tiff / wbmp - Single image per call — batch processing is not supported
- Image data is sent to
scan-business.quark.cnfor processing and is not permanently stored - Temporary files are saved to
/tmpand cleaned up after the task completes
yescan-ocr-qoder/
├── SKILL.md # Skill definition (read by QoderWork Agent)
├── scripts/
│ ├── scan.py # Main execution script (Python 3.9+)
│ └── common/ # Shared utility modules
├── references/
│ ├── scenarios-detail.md # Detailed scene descriptions
│ ├── error-codes.md # Error code reference
│ └── resources.md # Official links and resources
├── evals/ # Evaluation test cases
└── LICENSE # MIT License
- Quark Scan King Open Platform — API key & documentation
- ClawHub Skill Page — Install, downloads & ratings
- QoderWork — Agent runtime environment
- yescan-ai GitHub Org — More skills