Skip to content

MAINT: Improve readability in _reader.py by refactoring variable names (PEP8 cleanup).#3772

Open
semav-techdev wants to merge 19 commits into
py-pdf:mainfrom
semav-techdev:simav-issue-3231
Open

MAINT: Improve readability in _reader.py by refactoring variable names (PEP8 cleanup).#3772
semav-techdev wants to merge 19 commits into
py-pdf:mainfrom
semav-techdev:simav-issue-3231

Conversation

@semav-techdev
Copy link
Copy Markdown

Renamed vague variables in private methods with no functional changes.

… for module _reader.py [naming conventions]
@stefan6419846
Copy link
Copy Markdown
Collaborator

Thanks for the PR. Could you please check the merge conflicts? Additionally, how does renaming contribute to PEP 8 naming?

@semav-techdev
Copy link
Copy Markdown
Author

Thanks for the feedback! I’ll resolve the merge conflicts.
The renaming was done to improve readability and align with PEP 8 naming conventions, by replacing less descriptive variable names (e.g. objnum -> obj_num) with clearer, snake_case identifiers. The goal was to make the code easier to understand and maintain without changing behavior.

If there’s a specific naming convention preferred for this section, I’m happy to adjust it.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 88.57143% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.64%. Comparing base (46a2b04) to head (a529901).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pypdf/_reader.py 88.57% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3772   +/-   ##
=======================================
  Coverage   97.64%   97.64%           
=======================================
  Files          55       55           
  Lines       10227    10234    +7     
  Branches     1878     1880    +2     
=======================================
+ Hits         9986     9993    +7     
  Misses        137      137           
  Partials      104      104           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@semav-techdev
Copy link
Copy Markdown
Author

For better clarity , please see the “Function and Variable Names” section in PEP 8: https://peps.python.org/pep-0008/?utm_source=chatgpt.com#function-and-variable-names.

@stefan6419846
Copy link
Copy Markdown
Collaborator

I still find the title confusing - in general I am fine with increasing readability, but the linked issue especially is about actual violations, not bad naming not detected automatically.

Comment thread pypdf/_reader.py Outdated
# read the entire object stream into memory
stmnum, _idx = self.xref_objStm[indirect_reference.idnum]
obj_stm: EncodedStreamObject = IndirectObject(stmnum, 0, self).get_object() # type: ignore
stream_num, _idx = self.xref_objStm[indirect_reference.idnum]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to increase readability, we should avoid abbreviations wherever possible, thus using number is preferred over num and index over idx.

stream_num might not really be obvious enough here as well - it is the object number of the corresponding stream object.

Comment thread pypdf/_reader.py Outdated
stream.seek(-1, 1)
cnt = 0
while cnt < size:
count_number = 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never seen count_number used as a loop variable.

@semav-techdev semav-techdev changed the title MAINT: #3231 partial fix; PEP8 further compliance. Refactor variable names in _reader.py for readability. MAINT: Improve readability in _reader.py by refactoring variable names (PEP8 cleanup). May 12, 2026
@semav-techdev
Copy link
Copy Markdown
Author

Thank you for the review , I update the title , is it clear now?
I also renamed the variables as suggested in your comments , Could you check it again ?
@stefan6419846

Comment thread pypdf/_reader.py Outdated
)
logger_warning(
"Value /N %(n)d for object %(stmnum)d exceeds maximum allowed value %(max_n)d. Limiting to %(max_n)d.",
"Value /N %(n)d for object %(object_stream_number)d exceeds"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capture the multiline string in brackets as seen in other cases.

Comment thread pypdf/_reader.py Outdated
read_non_whitespace(stream_data)
stream_data.seek(-1, 1)
objnum = NumberObject.read_from_stream(stream_data)
obj_num = int(NumberObject.read_from_stream(stream_data))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the direct conversion to an integer increase the readability?

Copy link
Copy Markdown
Author

@semav-techdev semav-techdev May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct conversion to int was mainly to satisfy the type checker rather than improve readability.

When the code was:

obj_num = NumberObject.read_from_stream(stream_data)

the CI failed with mypy errors because the returned type was inferred as NumberObject | FloatObject, while later usages expected int.

Converting it explicitly with:

obj_num = int(NumberObject.read_from_stream(stream_data))

resolved the type mismatch errors in _reader.py and made the expected type explicit for later calls such as cache_get_indirect_object() and dictionary lookups.
@stefan6419846

Comment thread pypdf/_reader.py Outdated
# caching those stale versions would shadow the newer xref entry.
authoritative_stm, _idx = self.xref_objStm.get(obj_num, (None, None))
if authoritative_stm == stmnum:
if authoritative_stm == object_stream_number:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somehow looks inconsistent, as authoritative_stm is a stream number as well.

Comment thread pypdf/_reader.py Outdated
logger_warning(
"invalid pdf header: %(header_bytes)r",
source=__name__,
header_bytes=header_bytes)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
header_bytes=header_bytes)
header_bytes=header_bytes
)

Comment thread pypdf/_reader.py Outdated
stream.seek(-11, 1)
tmp = stream.read(20)
xref_loc = tmp.find(b"xref")
xref_data = stream.read(20)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xref_data = stream.read(20)
xref_data = stream.read(20)

Comment thread pypdf/_reader.py Outdated
tmp = stream.read(20)
xref_loc = tmp.find(b"xref")
xref_data = stream.read(20)
xref_loc = xref_data .find(b"xref")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xref_loc = xref_data .find(b"xref")
xref_loc = xref_data.find(b"xref")

Comment thread pypdf/_reader.py Outdated
# compressed objects
objstr_num = get_entry(1)
obstr_idx = get_entry(2)
obj_str_num = get_entry(1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another bulk of abbreviations. Preferably, when we are reviewing specific methods for such improvements, this should be done for all possible names, not only some of them.

@semav-techdev
Copy link
Copy Markdown
Author

Thank you for your feedback.
I would ask? Can I open a separate issue for variable renaming and readability improvements, so it can be tracked independently from this issue?

@stefan6419846
Copy link
Copy Markdown
Collaborator

Can I open a separate issue for variable renaming and readability improvements, so it can be tracked independently from this issue?

I do not really understand what you are referring to. Improving the variable names (apart from actual ignored violations) is nothing I consider requiring tracking in an issue.

@semav-techdev
Copy link
Copy Markdown
Author

semav-techdev commented May 15, 2026

Thank you for your feedback , In that case, I'ill continue working on current issue.
However I encountered a mypy issue that breaks the test and it does not seems related to my changes , Would you like me to fix it as a part of this issue ?
issue :
pypdf/_reader.py:973: error: Need type annotation for "xref_object_stream" (hint: "xref_object_stream: dict[, ] = ...") [var-annotated]
Found 1 error in 1 file (checked 104 source files)
@stefan6419846

@stefan6419846
Copy link
Copy Markdown
Collaborator

pypdf/_reader.py:973: error: Need type annotation for "xref_object_stream" (hint: "xref_object_stream: dict[, ] = ...") [var-annotated]

You changed public API - if I remember correctly, this variable is initialized and typed in the constructor of PdfReader. This cannot be done without a proper deprecation and indicates that some of your refactoring is incomplete as you renamed only some of the usages of this variable/attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants