Skip to content

Commit 4a5b8aa

Browse files
committed
Add newline span breaks
1 parent d78d4b4 commit 4a5b8aa

1 file changed

Lines changed: 1 addition & 6 deletions

File tree

pdftext/pdf/pages.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -107,16 +107,11 @@ def span_break():
107107
span_break()
108108
continue
109109

110-
# we also break on hyphenation
110+
# we break on hyphenation or newline
111111
if span['text'].endswith("\x02") or span['text'].endswith("\n"):
112112
span_break()
113113
continue
114114

115-
# sometimes pdfium doesn't inject a linebreak, so we check the span positions
116-
if char["bbox"].y_start > span["bbox"].y_end:
117-
span_break()
118-
continue
119-
120115
# Character is likely a superscript
121116
if all([
122117
char["bbox"][1] < (span["bbox"][1] - span["bbox"].height * line_distance_threshold), # char top is above span

0 commit comments

Comments
 (0)