Skip to content

fix: handle missing m:t text in OMML math runs (fixes #1512)#1668

Open
octo-patch wants to merge 3 commits intomicrosoft:mainfrom
octo-patch:fix/issue-1512-docx-math-missing
Open

fix: handle missing m:t text in OMML math runs (fixes #1512)#1668
octo-patch wants to merge 3 commits intomicrosoft:mainfrom
octo-patch:fix/issue-1512-docx-math-missing

Conversation

@octo-patch
Copy link
Copy Markdown

Fixes #1512

Problem

When converting a DOCX file with mathematical equations, if any m:r element inside an oMath block has no m:t child (for example, a formatting-only run containing only m:rPr), ElementTree.findtext() returns None. The do_r method in omml.py then does:

for s in elm.findtext("./{ns}t".format(...)):  # iterates over None → TypeError

This TypeError propagates up through _pre_process_mathpre_process_docx, where a file-level try/except catches it and falls back to the raw, unmodified DOCX XML. Since mammoth cannot render OMML markup, all equations in the document silently disappear from the converted markdown output.

Solution

  1. omml.pydo_r: use (elm.findtext(...) or "") so an absent m:t element yields an empty string instead of crashing.

  2. pre_process.py_pre_process_math: wrap each individual equation replacement in its own try/except with a warning log. This means a single un-convertible equation no longer causes every other equation in the document to be lost.

Testing

  • Existing test_docx_equations continues to pass.
  • New regression test test_docx_equations_omit_empty_run directly exercises an m:r with only m:rPr (no m:t) and asserts that the surrounding equation is still converted to $x$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Doesn't convert the docx to md properly with the mathematical equations inside

1 participant