BUG: Fix merge_page crash on pages with markup annotations#3785
Conversation
DictionaryObject.clone() called self.__class__() with no arguments, which crashes for annotation subclasses (Polygon, Line, Square, Circle) whose constructors require parameters. Fix by catching the TypeError and falling back to a plain DictionaryObject. Fixes py-pdf#3467
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3785 +/- ##
==========================================
- Coverage 97.64% 97.64% -0.01%
==========================================
Files 55 55
Lines 10291 10312 +21
Branches 1890 1891 +1
==========================================
+ Hits 10049 10069 +20
- Misses 137 138 +1
Partials 105 105 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Adds a regression test for issue py-pdf#3467 that merges a page containing Polygon and Line annotation instances. Without the clone() fallback this raises a TypeError because annotation subclass constructors require arguments. Signed-off-by: Abdulazez A. <abzaeko@gmail.com>
Test that cloning a page with a Polygon annotation across PdfWriters does not crash. The clone() method on DictionaryObject calls self.__class__() which fails for annotation subclasses with required constructor args. The fix catches the exception and falls back to a plain DictionaryObject. Signed-off-by: Abdulazez A. <abzaeko@gmail.com>
Signed-off-by: Abdulazez A. <abzaeko@gmail.com>
|
Could you please elaborate why this is the correct solution compared to the solution proposed in the issue? While an annotation might be a dictionary internally, it would return a "wrong" object type now. Additionally, the proposed mechanism is very broad and would apply to all other cases as well without any warning, which sounds wrong to me. |
Reviewer feedback (stefan6419846): - Broad except Exception was too aggressive and silent - Returning DictionaryObject for annotation subclasses loses type info Fix: - Add _clone_class class attribute to DictionaryObject (None = use self.__class__() as before) - Narrow except Exception to except TypeError (specific error when calling constructor with missing required args) - Add logger_warning when falling back - Set _clone_class = DictionaryObject on all 10 annotation classes that require constructor arguments This gives annotation authors explicit control over the fallback type, while keeping the generic TypeError safeguard for any future subclass that may have a similar issue.
|
Thanks for the thorough review @stefan6419846 — both points are well taken. I have reworked the fix: New approach:
The combination means:
Changes pushed. All 21 annotation tests + 13 data structure tests pass. |
logger_warning expects source= as a keyword-only parameter, not a positional arg.
| try: | ||
| obj = self.__class__() | ||
| except TypeError: | ||
| # Some subclasses (e.g., annotation types) require constructor |
There was a problem hiding this comment.
Why do we need to still catch this if we have an explicit _clone_class? And how do we ensure that we actually get the correct class, id est a proper annotation, at the end of the process?
Summary
PageObject.merge_page()crashes withTypeErrorwhen the source page contains a markup annotation (e.g., Polygon, Line, Square, Circle).Root Cause
DictionaryObject.clone()callsself.__class__()with no arguments. Annotation subclasses likePolygonrequire constructor arguments (e.g.,vertices), so this crashes.Fix
Catch the
TypeErrorwhen instantiatingself.__class__()and fall back to a plainDictionaryObject. Since the clone immediately copies all key-value pairs from the source, the cloned object retains the annotation data regardless of which Python class wraps it.Testing
test_merge_page_with_markup_annotationcovers the clone fallback pathtest_dictionary_object__clone_fallback_on_annotation_subclasscovers the code path uncovered by codecovAI Assistance Declaration
AI (Claude Code / Hermes Agent) was used for coding assistance on this PR. The author reviewed and validated all changes.
Fixes #3467