Skip to content

BUG: Fix merge_page crash on pages with markup annotations#3785

Open
Abzaek wants to merge 8 commits into
py-pdf:mainfrom
Abzaek:fix/annotation-clone-crash
Open

BUG: Fix merge_page crash on pages with markup annotations#3785
Abzaek wants to merge 8 commits into
py-pdf:mainfrom
Abzaek:fix/annotation-clone-crash

Conversation

@Abzaek
Copy link
Copy Markdown
Contributor

@Abzaek Abzaek commented May 16, 2026

Summary

PageObject.merge_page() crashes with TypeError when the source page contains a markup annotation (e.g., Polygon, Line, Square, Circle).

Root Cause

DictionaryObject.clone() calls self.__class__() with no arguments. Annotation subclasses like Polygon require constructor arguments (e.g., vertices), so this crashes.

Fix

Catch the TypeError when instantiating self.__class__() and fall back to a plain DictionaryObject. Since the clone immediately copies all key-value pairs from the source, the cloned object retains the annotation data regardless of which Python class wraps it.

Testing

  • ✅ Polygon annotation merge works
  • ✅ Line annotation merge works
  • ✅ Merged PDF writes and reads back correctly
  • ✅ New regression test test_merge_page_with_markup_annotation covers the clone fallback path
  • ✅ New regression test test_dictionary_object__clone_fallback_on_annotation_subclass covers the code path uncovered by codecov

AI Assistance Declaration

AI (Claude Code / Hermes Agent) was used for coding assistance on this PR. The author reviewed and validated all changes.

Fixes #3467

DictionaryObject.clone() called self.__class__() with no arguments,
which crashes for annotation subclasses (Polygon, Line, Square, Circle)
whose constructors require parameters. Fix by catching the TypeError
and falling back to a plain DictionaryObject.

Fixes py-pdf#3467
@codecov
Copy link
Copy Markdown

codecov Bot commented May 16, 2026

Codecov Report

❌ Patch coverage is 84.21053% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.64%. Comparing base (8c03dc4) to head (dfeb3f3).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
pypdf/generic/_data_structures.py 66.66% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3785      +/-   ##
==========================================
- Coverage   97.64%   97.64%   -0.01%     
==========================================
  Files          55       55              
  Lines       10291    10312      +21     
  Branches     1890     1891       +1     
==========================================
+ Hits        10049    10069      +20     
- Misses        137      138       +1     
  Partials      105      105              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Abzaek Abzaek changed the title Fix: merge_page crashes on pages with markup annotations (#3467) BUG: Fix merge_page crash on pages with markup annotations May 17, 2026
Abzaek and others added 4 commits May 17, 2026 07:07
Adds a regression test for issue py-pdf#3467 that merges a page containing
Polygon and Line annotation instances. Without the clone() fallback
this raises a TypeError because annotation subclass constructors
require arguments.

Signed-off-by: Abdulazez A. <abzaeko@gmail.com>
Test that cloning a page with a Polygon annotation across PdfWriters
does not crash. The clone() method on DictionaryObject calls
self.__class__() which fails for annotation subclasses with required
constructor args. The fix catches the exception and falls back to
a plain DictionaryObject.

Signed-off-by: Abdulazez A. <abzaeko@gmail.com>
Signed-off-by: Abdulazez A. <abzaeko@gmail.com>
@stefan6419846 stefan6419846 added the needs-discussion The PR/issue needs more discussion before we can continue label May 18, 2026
@stefan6419846
Copy link
Copy Markdown
Collaborator

Could you please elaborate why this is the correct solution compared to the solution proposed in the issue? While an annotation might be a dictionary internally, it would return a "wrong" object type now. Additionally, the proposed mechanism is very broad and would apply to all other cases as well without any warning, which sounds wrong to me.

Reviewer feedback (stefan6419846):
- Broad except Exception was too aggressive and silent
- Returning DictionaryObject for annotation subclasses loses type info

Fix:
- Add _clone_class class attribute to DictionaryObject
  (None = use self.__class__() as before)
- Narrow except Exception to except TypeError (specific error when
  calling constructor with missing required args)
- Add logger_warning when falling back
- Set _clone_class = DictionaryObject on all 10 annotation classes
  that require constructor arguments

This gives annotation authors explicit control over the fallback
type, while keeping the generic TypeError safeguard for any future
subclass that may have a similar issue.
@Abzaek
Copy link
Copy Markdown
Contributor Author

Abzaek commented May 18, 2026

Thanks for the thorough review @stefan6419846 — both points are well taken. I have reworked the fix:

New approach: _clone_class class attribute + narrow except TypeError

  1. DictionaryObject got a _clone_class class attribute (default None). Subclasses that cannot be constructed with no arguments set this to a suitable fallback type. clone() checks this first — no exception handling needed for well-annotated subclasses.

  2. The remaining fallback was narrowed from except Exception to except TypeError — this is the specific error raised when self.__class__() is called with missing required args. It won't swallow legitimate errors like MemoryError, ImportError, or bugs in __init__. It also emits a logger_warning when triggered so the fallback is visible.

  3. All 10 annotation subclasses (Text, FreeText, Line, PolyLine, Rectangle, Highlight, Ellipse, Polygon, Link, Popup) now set _clone_class = DictionaryObject — an explicit opt-in declaration that they cannot be cloned as themselves.

The combination means:

  • For explicitly-annotated subclasses like Polygon: zero exception handling, clean flow
  • For unknown future subclasses: only TypeError is caught, with a warning
  • No silent error swallowing
  • Each annotation class explicitly owns its fallback type

Changes pushed. All 21 annotation tests + 13 data structure tests pass.

logger_warning expects source= as a keyword-only parameter, not a positional arg.
try:
obj = self.__class__()
except TypeError:
# Some subclasses (e.g., annotation types) require constructor
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to still catch this if we have an explicit _clone_class? And how do we ensure that we actually get the correct class, id est a proper annotation, at the end of the process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-discussion The PR/issue needs more discussion before we can continue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

merge_page with passed page having markup annotation fails

2 participants