MAINT: Improve readability in _reader.py by refactoring variable names (PEP8 cleanup).#3772
MAINT: Improve readability in _reader.py by refactoring variable names (PEP8 cleanup).#3772semav-techdev wants to merge 19 commits into
Conversation
… for module _reader.py [naming conventions]
|
Thanks for the PR. Could you please check the merge conflicts? Additionally, how does renaming contribute to PEP 8 naming? |
|
Thanks for the feedback! I’ll resolve the merge conflicts. If there’s a specific naming convention preferred for this section, I’m happy to adjust it. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3772 +/- ##
=======================================
Coverage 97.64% 97.64%
=======================================
Files 55 55
Lines 10227 10234 +7
Branches 1878 1880 +2
=======================================
+ Hits 9986 9993 +7
Misses 137 137
Partials 104 104 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
For better clarity , please see the “Function and Variable Names” section in PEP 8: https://peps.python.org/pep-0008/?utm_source=chatgpt.com#function-and-variable-names. |
|
I still find the title confusing - in general I am fine with increasing readability, but the linked issue especially is about actual violations, not bad naming not detected automatically. |
| # read the entire object stream into memory | ||
| stmnum, _idx = self.xref_objStm[indirect_reference.idnum] | ||
| obj_stm: EncodedStreamObject = IndirectObject(stmnum, 0, self).get_object() # type: ignore | ||
| stream_num, _idx = self.xref_objStm[indirect_reference.idnum] |
There was a problem hiding this comment.
If we want to increase readability, we should avoid abbreviations wherever possible, thus using number is preferred over num and index over idx.
stream_num might not really be obvious enough here as well - it is the object number of the corresponding stream object.
| stream.seek(-1, 1) | ||
| cnt = 0 | ||
| while cnt < size: | ||
| count_number = 0 |
There was a problem hiding this comment.
I have never seen count_number used as a loop variable.
|
Thank you for the review , I update the title , is it clear now? |
| ) | ||
| logger_warning( | ||
| "Value /N %(n)d for object %(stmnum)d exceeds maximum allowed value %(max_n)d. Limiting to %(max_n)d.", | ||
| "Value /N %(n)d for object %(object_stream_number)d exceeds" |
There was a problem hiding this comment.
Please capture the multiline string in brackets as seen in other cases.
| read_non_whitespace(stream_data) | ||
| stream_data.seek(-1, 1) | ||
| objnum = NumberObject.read_from_stream(stream_data) | ||
| obj_num = int(NumberObject.read_from_stream(stream_data)) |
There was a problem hiding this comment.
How does the direct conversion to an integer increase the readability?
There was a problem hiding this comment.
The direct conversion to int was mainly to satisfy the type checker rather than improve readability.
When the code was:
obj_num = NumberObject.read_from_stream(stream_data)
the CI failed with mypy errors because the returned type was inferred as NumberObject | FloatObject, while later usages expected int.
Converting it explicitly with:
obj_num = int(NumberObject.read_from_stream(stream_data))
resolved the type mismatch errors in _reader.py and made the expected type explicit for later calls such as cache_get_indirect_object() and dictionary lookups.
@stefan6419846
| # caching those stale versions would shadow the newer xref entry. | ||
| authoritative_stm, _idx = self.xref_objStm.get(obj_num, (None, None)) | ||
| if authoritative_stm == stmnum: | ||
| if authoritative_stm == object_stream_number: |
There was a problem hiding this comment.
This somehow looks inconsistent, as authoritative_stm is a stream number as well.
| logger_warning( | ||
| "invalid pdf header: %(header_bytes)r", | ||
| source=__name__, | ||
| header_bytes=header_bytes) |
There was a problem hiding this comment.
| header_bytes=header_bytes) | |
| header_bytes=header_bytes | |
| ) |
| stream.seek(-11, 1) | ||
| tmp = stream.read(20) | ||
| xref_loc = tmp.find(b"xref") | ||
| xref_data = stream.read(20) |
There was a problem hiding this comment.
| xref_data = stream.read(20) | |
| xref_data = stream.read(20) |
| tmp = stream.read(20) | ||
| xref_loc = tmp.find(b"xref") | ||
| xref_data = stream.read(20) | ||
| xref_loc = xref_data .find(b"xref") |
There was a problem hiding this comment.
| xref_loc = xref_data .find(b"xref") | |
| xref_loc = xref_data.find(b"xref") |
| # compressed objects | ||
| objstr_num = get_entry(1) | ||
| obstr_idx = get_entry(2) | ||
| obj_str_num = get_entry(1) |
There was a problem hiding this comment.
This is another bulk of abbreviations. Preferably, when we are reviewing specific methods for such improvements, this should be done for all possible names, not only some of them.
|
Thank you for your feedback. |
I do not really understand what you are referring to. Improving the variable names (apart from actual ignored violations) is nothing I consider requiring tracking in an issue. |
|
Thank you for your feedback , In that case, I'ill continue working on current issue. |
You changed public API - if I remember correctly, this variable is initialized and typed in the constructor of |
Renamed vague variables in private methods with no functional changes.