MAINT: Improve readability in _reader.py by refactoring variable names (PEP8 cleanup). by semav-techdev · Pull Request #3772 · py-pdf/pypdf

semav-techdev · 2026-05-11T14:41:12Z

Renamed vague variables in private methods with no functional changes.

… for module _reader.py [naming conventions]

stefan6419846 · 2026-05-11T16:58:19Z

Thanks for the PR. Could you please check the merge conflicts? Additionally, how does renaming contribute to PEP 8 naming?

semav-techdev · 2026-05-11T18:40:33Z

Thanks for the feedback! I’ll resolve the merge conflicts.
The renaming was done to improve readability and align with PEP 8 naming conventions, by replacing less descriptive variable names (e.g. objnum -> obj_num) with clearer, snake_case identifiers. The goal was to make the code easier to understand and maintain without changing behavior.

If there’s a specific naming convention preferred for this section, I’m happy to adjust it.

codecov · 2026-05-11T19:15:29Z

Codecov Report

❌ Patch coverage is 88.57143% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.64%. Comparing base (46a2b04) to head (a529901).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
pypdf/_reader.py	88.57%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3772   +/-   ##
=======================================
  Coverage   97.64%   97.64%           
=======================================
  Files          55       55           
  Lines       10227    10234    +7     
  Branches     1878     1880    +2     
=======================================
+ Hits         9986     9993    +7     
  Misses        137      137           
  Partials      104      104

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

semav-techdev · 2026-05-12T07:19:18Z

For better clarity , please see the “Function and Variable Names” section in PEP 8: https://peps.python.org/pep-0008/?utm_source=chatgpt.com#function-and-variable-names.

stefan6419846 · 2026-05-12T09:20:02Z

I still find the title confusing - in general I am fine with increasing readability, but the linked issue especially is about actual violations, not bad naming not detected automatically.

stefan6419846 · 2026-05-12T09:24:01Z

        # read the entire object stream into memory
-        stmnum, _idx = self.xref_objStm[indirect_reference.idnum]
-        obj_stm: EncodedStreamObject = IndirectObject(stmnum, 0, self).get_object()  # type: ignore
+        stream_num, _idx = self.xref_objStm[indirect_reference.idnum]


If we want to increase readability, we should avoid abbreviations wherever possible, thus using number is preferred over num and index over idx.

stream_num might not really be obvious enough here as well - it is the object number of the corresponding stream object.

stefan6419846 · 2026-05-12T09:25:17Z

            stream.seek(-1, 1)
-            cnt = 0
-            while cnt < size:
+            count_number = 0


I have never seen count_number used as a loop variable.

semav-techdev · 2026-05-13T07:13:53Z

Thank you for the review , I update the title , is it clear now?
I also renamed the variables as suggested in your comments , Could you check it again ?
@stefan6419846

stefan6419846 · 2026-05-13T13:14:56Z

+                )
            logger_warning(
-                "Value /N %(n)d for object %(stmnum)d exceeds maximum allowed value %(max_n)d. Limiting to %(max_n)d.",
+                "Value /N %(n)d for object %(object_stream_number)d exceeds"


Please capture the multiline string in brackets as seen in other cases.

stefan6419846 · 2026-05-13T13:15:33Z

            read_non_whitespace(stream_data)
            stream_data.seek(-1, 1)
-            objnum = NumberObject.read_from_stream(stream_data)
+            obj_num = int(NumberObject.read_from_stream(stream_data))


How does the direct conversion to an integer increase the readability?

The direct conversion to int was mainly to satisfy the type checker rather than improve readability.

When the code was:

obj_num = NumberObject.read_from_stream(stream_data)

the CI failed with mypy errors because the returned type was inferred as NumberObject | FloatObject, while later usages expected int.

Converting it explicitly with:

obj_num = int(NumberObject.read_from_stream(stream_data))

resolved the type mismatch errors in _reader.py and made the expected type explicit for later calls such as cache_get_indirect_object() and dictionary lookups.
@stefan6419846

stefan6419846 · 2026-05-13T13:16:19Z

            # caching those stale versions would shadow the newer xref entry.
            authoritative_stm, _idx = self.xref_objStm.get(obj_num, (None, None))
-            if authoritative_stm == stmnum:
+            if authoritative_stm == object_stream_number:


This somehow looks inconsistent, as authoritative_stm is a stream number as well.

stefan6419846 · 2026-05-13T13:16:38Z

+            logger_warning(
+                "invalid pdf header: %(header_bytes)r",
+                source=__name__,
+                header_bytes=header_bytes)


Suggested change

header_bytes=header_bytes)

header_bytes=header_bytes

)

stefan6419846 · 2026-05-13T13:17:11Z

        stream.seek(-11, 1)
-        tmp = stream.read(20)
-        xref_loc = tmp.find(b"xref")
+        xref_data  = stream.read(20)


Suggested change

xref_data = stream.read(20)

xref_data = stream.read(20)

stefan6419846 · 2026-05-13T13:17:18Z

-        tmp = stream.read(20)
-        xref_loc = tmp.find(b"xref")
+        xref_data  = stream.read(20)
+        xref_loc = xref_data .find(b"xref")


Suggested change

xref_loc = xref_data .find(b"xref")

xref_loc = xref_data.find(b"xref")

stefan6419846 · 2026-05-13T13:18:08Z

                    # compressed objects
-                    objstr_num = get_entry(1)
-                    obstr_idx = get_entry(2)
+                    obj_str_num = get_entry(1)


This is another bulk of abbreviations. Preferably, when we are reviewing specific methods for such improvements, this should be done for all possible names, not only some of them.

semav-techdev · 2026-05-14T08:13:10Z

Thank you for your feedback.
I would ask? Can I open a separate issue for variable renaming and readability improvements, so it can be tracked independently from this issue?

stefan6419846 · 2026-05-14T09:57:11Z

Can I open a separate issue for variable renaming and readability improvements, so it can be tracked independently from this issue?

I do not really understand what you are referring to. Improving the variable names (apart from actual ignored violations) is nothing I consider requiring tracking in an issue.

semav-techdev · 2026-05-15T13:41:00Z

Thank you for your feedback , In that case, I'ill continue working on current issue.
However I encountered a mypy issue that breaks the test and it does not seems related to my changes , Would you like me to fix it as a part of this issue ?
issue :
pypdf/_reader.py:973: error: Need type annotation for "xref_object_stream" (hint: "xref_object_stream: dict[, ] = ...") [var-annotated]
Found 1 error in 1 file (checked 104 source files)
@stefan6419846

stefan6419846 · 2026-05-18T09:51:43Z

pypdf/_reader.py:973: error: Need type annotation for "xref_object_stream" (hint: "xref_object_stream: dict[, ] = ...") [var-annotated]

You changed public API - if I remember correctly, this variable is initialized and typed in the constructor of PdfReader. This cannot be done without a proper deprecation and indicates that some of your refactoring is incomplete as you renamed only some of the usages of this variable/attribute.

MAINT: py-pdf#3231 partial fix; PEP8 further compliance. Manual check…

96923c5

… for module _reader.py [naming conventions]

semav-techdev added 7 commits May 11, 2026 20:23

fix the conflict

5fd8160

fix the conflict

59cb566

fix the conflict

0ebff4c

fix the conflict

e90561b

fix ruff issue

febd5f4

fix ruff issue

f86255b

fix mypy issue

3c5d7ae

fix test issue

a9562de

Merge branch 'main' into simav-issue-3231

82c8308

stefan6419846 reviewed May 12, 2026

View reviewed changes

semav-techdev changed the title ~~MAINT: #3231 partial fix; PEP8 further compliance. Refactor variable names in _reader.py for readability.~~ MAINT: Improve readability in _reader.py by refactoring variable names (PEP8 cleanup). May 12, 2026

semav-techdev added 7 commits May 13, 2026 08:30

improving

a880cb8

improving

65ec7d4

improving

36c177d

improve

a7f9204

improve

6947e89

improve1

d7de24a

improve2

9acb9f4

stefan6419846 requested changes May 13, 2026

View reviewed changes

more renamed

f3b1111

improve3

a529901

	xref_loc = xref_data .find(b"xref")
	xref_loc = xref_data.find(b"xref")

Conversation

semav-techdev commented May 11, 2026

Uh oh!

stefan6419846 commented May 11, 2026

Uh oh!

semav-techdev commented May 11, 2026

Uh oh!

codecov Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

semav-techdev commented May 12, 2026

Uh oh!

stefan6419846 commented May 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

semav-techdev commented May 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

semav-techdev May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

semav-techdev commented May 14, 2026

Uh oh!

stefan6419846 commented May 14, 2026

Uh oh!

semav-techdev commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefan6419846 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 11, 2026 •

edited

Loading

semav-techdev May 14, 2026 •

edited

Loading

semav-techdev commented May 15, 2026 •

edited

Loading