Skip to content

[codex] Add optional underline preservation for DOCX and HTML#1672

Draft
Hartnakig wants to merge 1 commit intomicrosoft:mainfrom
Hartnakig:codex/preserve-underlines-option
Draft

[codex] Add optional underline preservation for DOCX and HTML#1672
Hartnakig wants to merge 1 commit intomicrosoft:mainfrom
Hartnakig:codex/preserve-underlines-option

Conversation

@Hartnakig
Copy link
Copy Markdown

Summary

  • add a preserve_underlines option to MarkItDown
  • preserve DOCX underlines by injecting Mammoth''s u => u style map when the option is enabled
  • preserve <u>...</u> tags in the HTML-to-Markdown path when underline preservation is enabled
  • add focused tests and README usage documentation

Why

This addresses #35 by making underline preservation available as an opt-in behavior without changing the default output.

Validation

  • E:\C_WORK\A_AGTEC\markitdown-main\.venv\Scripts\python -m pytest E:\C_WORK\A_AGTEC\markitdown-main\packages\markitdown\tests\test_module_misc.py -k "preserve_underlines or docx_comments or docx_equations" -q

@Hartnakig
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant