fix: harden JS HTTP downloader resume, retry and stall handling#2399
fix: harden JS HTTP downloader resume, retry and stall handling#2399Moyasee wants to merge 7 commits into
Conversation
Fixes the direct downloader stalling, freezing and ballooning past the real file size on some hosters (GH-2388, LBX-723). Resume / size doubling (GH-2388): - When a host ignores the Range header and answers 200 with the full body, stream-skip the bytes we already have and append only the remainder instead of writing the whole file on top of the partial. - Stop reporting startByte + contentLength as the total on a 200 resume. - Align to the server's Content-Range start on 206 so an early/overlapping partial does not duplicate bytes; restart cleanly if it would gap. - Refuse to append a body that is shorter than the existing partial. Freezing / stalling (LBX-723): - Treat undici "terminated" and the real code on err.cause (UND_ERR_*, ECONNRESET, etc.) as retryable so a dropped socket resumes instead of failing the whole download. - Reset the retry budget once a meaningful amount of new data has flowed in an attempt, capped so a dead link still gives up. - Base stall detection on time spent waiting inside a single read and raise the timeout, so a slow-but-alive transfer is not aborted; a truly stuck download now surfaces an error and lets the queue advance. Transient HTTP errors: - Retry 408/429/500/502/503/504 with backoff in both the downloader and the preflight check so a rate-limited link no longer fails to start; permanent 4xx still fail fast. Other: - Force Accept-Encoding: identity and bail out of size-based resume under transfer compression. - Cancel the response reader on teardown to avoid leaking sockets. - Remember the Content-Disposition filename across retries. - Clamp progress, guard AllDebrid batch accounting, route batch failures through the error handler, and add a generation token so a cancel during preparation cannot spawn a second writer on the same file. - Extract the pure decision helpers and cover them with unit tests.
Greptile SummaryThis PR hardens the JS HTTP downloader by fixing file-size doubling on servers that ignore
Confidence Score: 5/5The change is safe to merge: it fixes two documented data-corruption and stall bugs without regressing any existing paths, and the pure-function helpers are fully unit-tested. All critical paths (resume, retry, stall, cancel-during-prepare) are covered by either unit tests or explicit integration scenarios. The generation-token race fix is logically sound in a single-threaded JS runtime. The helper extraction makes all decision logic independently verifiable, and the 41-case test suite directly exercises each branch. No defects were found in the changed code. No files require special attention. js-http-downloader.ts is the most complex file but its key decision logic has been moved to well-tested helpers, and the new stream lifecycle follows correct Node.js stream contracts. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[startDownload] --> B[startDownloadWithRetry loop]
B --> C[prepareDownloadPath]
C --> D[buildRequestHeaders]
D --> E[executeDownload]
E --> F{HTTP status?}
F -->|416 + local complete| G[status=complete]
F -->|400+ retryable 429/500/502/503/504| H[HttpDownloadStatusError retryable=true]
F -->|400+ permanent 403/404/410| I[fatal error]
F -->|200 or 206| J[resolveResumeAction]
J -->|200 + startByte>0| K[flags=a skipBytes=startByte]
J -->|206 overlap| L[flags=a skipBytes=overlap]
J -->|206 gap| M[flags=w restart]
J -->|206 aligned| N[flags=a skipBytes=0]
K & L & M & N --> O[createReadableStream]
O --> P{done?}
P -->|remainingToSkip>0| Q[destroy: body too short]
P -->|shouldWrite=false| O
P -->|shouldWrite=true| R[push chunk]
R --> O
P -->|done + skip=0| S[push null complete]
H --> T[handleDownloadErrorWithRetry]
T -->|canRetry| U[backoff retry]
T -->|stall exhausted| V[fatal stall error]
T -->|permanent| W[fatal error]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[startDownload] --> B[startDownloadWithRetry loop]
B --> C[prepareDownloadPath]
C --> D[buildRequestHeaders]
D --> E[executeDownload]
E --> F{HTTP status?}
F -->|416 + local complete| G[status=complete]
F -->|400+ retryable 429/500/502/503/504| H[HttpDownloadStatusError retryable=true]
F -->|400+ permanent 403/404/410| I[fatal error]
F -->|200 or 206| J[resolveResumeAction]
J -->|200 + startByte>0| K[flags=a skipBytes=startByte]
J -->|206 overlap| L[flags=a skipBytes=overlap]
J -->|206 gap| M[flags=w restart]
J -->|206 aligned| N[flags=a skipBytes=0]
K & L & M & N --> O[createReadableStream]
O --> P{done?}
P -->|remainingToSkip>0| Q[destroy: body too short]
P -->|shouldWrite=false| O
P -->|shouldWrite=true| R[push chunk]
R --> O
P -->|done + skip=0| S[push null complete]
H --> T[handleDownloadErrorWithRetry]
T -->|canRetry| U[backoff retry]
T -->|stall exhausted| V[fatal stall error]
T -->|permanent| W[fatal error]
Reviews (2): Last reviewed commit: "fix: make retryable flag authoritative a..." | Re-trigger Greptile |
Address review feedback: - isRetryableDownloadError now returns false when an error is explicitly tagged retryable: false, so a permanent failure whose message happens to contain "connection"/"timeout"/etc. is no longer retried. - Extract the preflight backoff base delay into a named constant.
|
Addressed both points in 58432fe:
@greptile review |
A tester hit a GoFile CDN that rate-limited every request (HTTP 429). The transient-status retry then kept retrying for about 2.5 minutes while the UI still showed an active download, which looked like a stall. Give transient HTTP statuses (429/503/etc.) a short dedicated retry budget separate from the network/stall budget, so a rate-limited or temporarily unavailable server fails within a few seconds with a clear message and the queue can move on instead of sitting on a dead link. Honor the Retry-After header (delta-seconds or HTTP-date, capped) when the server sends one.
|
Investigated koddy's tester logs (reported stall). Root cause: it was not a mid-stream stall. He repeatedly hit GoFile, which was failing for him (official 403, alternate CDN Fix in 1d705bf: transient HTTP statuses (429/503/etc.) now get a short dedicated retry budget separate from network/stall retries, so a hard rate-limited or unavailable link fails within a few seconds with a clear "rate-limiting or temporarily unavailable" message and the queue moves on, instead of masquerading as a stalled download. Also honors Follow-up worth considering separately (not in this PR, it is in the GoFile resolver, not the downloader): when the alternate GoFile CDN returns 4xx/429, fall back to the official GoFile source instead of dead-ending on the bypass CDN. |
i did nothing! I either clicked on the black space on the Hydra app because I clicked another app or alt-tabbed back into Hydra. When i did hit the gofile was to make sure TorBox caught the hook. This didn't happen because I went back into it and redownloaded the same file. The doubling of the GBs happened while I was watching it and didn't have any peripherals in my hands |
Hey! This text you are quoting is not related to your issue, no worries |
Detect connectivity changes from the renderer (online/offline plus NetworkInformation change events) and from powerMonitor wake, and route them through the download orchestrator. On a network switch or reconnect the active HTTP download aborts the half-open socket and resumes from the on-disk offset instead of waiting out the 30s stall timeout. A real disconnect surfaces a Reconnecting indicator in the bottom bar, downloads page and big picture, and after a 15s grace window with no connection the download is paused with its partial file kept.
Extract the retry-budget reset and transient-status handling out of the JS downloader error handler, and split preflight validation into header building and a per-attempt helper, keeping each function under the cognitive-complexity limit. Drop a negated ternary and use globalThis in the preload network listeners.
A network change usually surfaces as a dropped socket or a stall, both of which recover through the generic retry branch. That branch resumed without marking the download as reconnecting, so the indicator only appeared on the connectivity-event path and users saw the download jump straight to recalculating the ETA. Mark the download reconnecting (and zero the speed) on any network retry, and clear it on a terminal error.
|



When submitting this pull request, I confirm the following (please check the boxes):
Fill in the PR content:
Reworks the native HTTP downloader so direct downloads stop stalling, freezing and growing past the real file size on some hosters. Closes the two reported issues and a handful of related edge cases found along the way.
Fixes #2388 (downloads adding gigs) and LBX-723 (DDL hosters freezing).
Size doubling (#2388)
The downloader resumed by stat-ing the partial and sending
Range, but never checked the server honored it. When a host ignoresRangeand replies200with the full body, the old code appended the whole file on top of the partial (140GB became ~300GB) and reportedstartByte + contentLengthas the total.200resume, stream-skip the bytes already on disk and append only the remainder, keeping the partial intact instead of duplicating it.contentLength(notstartByte + contentLength) as the total for a200.Content-Rangestart on a206so an early/overlapping partial cannot duplicate bytes, and restart cleanly if it would leave a gap.Freezing / stalling (LBX-723)
undicisurfaces a dropped socket asTypeError: terminatedwith the real code onerr.cause. The old retry check only looked aterr.message/err.code, so a recoverable disconnect was treated as fatal and the download stopped (pause + resume was the only workaround).terminated,err.cause.codeand theUND_ERR_*family as retryable.Transient HTTP errors
408/429/500/502/503/504were fatal, so a rate-limited link (common on Datanodes) failed to even start. They now back off and retry in both the preflight check and the downloader; permanent 4xx (403/404/410) still fail fast.Other hardening
Accept-Encoding: identityand bail out of size-based resume under transfer compression.Content-Dispositionfilename across retries so a resume targets the right partial.Testing
yarn test, 41 cases).terminated, 206 overlap, short body, 429, pause/resume, throttle, chunked) with byte-for-byte (SHA-256) checks.