From a3f783e31139416442ddf32bf83e7f1c4f5661f9 Mon Sep 17 00:00:00 2001 From: jimmyzhuu Date: Mon, 30 Mar 2026 22:22:44 +0800 Subject: [PATCH] Clarify plugin-first OCR extension docs --- README.md | 8 +++++++- packages/markitdown-ocr/README.md | 14 ++++++++------ packages/markitdown-sample-plugin/README.md | 2 ++ 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 6da3ee1d9..11ea993d1 100644 --- a/README.md +++ b/README.md @@ -132,6 +132,12 @@ markitdown --use-plugins path-to-file.pdf To find available plugins, search GitHub for the hashtag `#markitdown-plugin`. To develop a plugin, see `packages/markitdown-sample-plugin`. +Plugins are the recommended extension path for optional or backend-specific functionality, especially when an extension: + +- adds non-default dependencies +- depends on external services or model runtimes +- changes converter behavior only for opt-in users + #### markitdown-ocr Plugin The `markitdown-ocr` plugin adds OCR support to PDF, DOCX, PPTX, and XLSX converters, extracting text from embedded images using LLM Vision — the same `llm_client` / `llm_model` pattern that MarkItDown already uses for image descriptions. No new ML libraries or binary dependencies required. @@ -143,7 +149,7 @@ pip install markitdown-ocr pip install openai # or any OpenAI-compatible client ``` -**Usage:** +**Usage (Python API):** Pass the same `llm_client` and `llm_model` you would use for image descriptions: diff --git a/packages/markitdown-ocr/README.md b/packages/markitdown-ocr/README.md index d0883db4a..c66f85d2e 100644 --- a/packages/markitdown-ocr/README.md +++ b/packages/markitdown-ocr/README.md @@ -26,12 +26,6 @@ pip install openai ## Usage -### Command Line - -```bash -markitdown document.pdf --use-plugins --llm-client openai --llm-model gpt-4o -``` - ### Python API Pass `llm_client` and `llm_model` to `MarkItDown()` exactly as you would for image descriptions: @@ -52,6 +46,12 @@ print(result.text_content) If no `llm_client` is provided the plugin still loads, but OCR is silently skipped — falling back to the standard built-in converter. +### Command Line + +MarkItDown's built-in CLI can enable installed plugins with `--use-plugins`, but it does not currently construct Python client objects such as `OpenAI()` for you. + +For that reason, this plugin is primarily configured through the Python API shown above, where you can pass `llm_client` and `llm_model` directly to `MarkItDown(...)`. + ### Custom Prompt Override the default extraction prompt for specialized documents: @@ -100,6 +100,8 @@ When a file is converted: 4. The returned text is inserted inline, preserving document structure 5. If the LLM call fails, conversion continues without that image's text +This plugin is one example of the broader plugin-first extension model in MarkItDown: backend-specific OCR or document-processing logic can live in separately installed packages without changing the default core behavior. + ## Supported File Formats ### PDF diff --git a/packages/markitdown-sample-plugin/README.md b/packages/markitdown-sample-plugin/README.md index adf1d9e7c..d6da193d3 100644 --- a/packages/markitdown-sample-plugin/README.md +++ b/packages/markitdown-sample-plugin/README.md @@ -71,6 +71,8 @@ sample_plugin = "markitdown_sample_plugin" Here, the value of `sample_plugin` can be any key, but should ideally be the name of the plugin. The value is the fully qualified name of the package implementing the plugin. +If your plugin needs optional configuration, you can also read additional keyword arguments passed through `MarkItDown(enable_plugins=True, **kwargs)` from `register_converters(markitdown, **kwargs)`. + ## Installation