Skip to content

fix(deltaproxycache): better handling of failed requests#989

Open
Elia-Renzoni wants to merge 5 commits into
trickstercache:mainfrom
Elia-Renzoni:failed-upstream-req
Open

fix(deltaproxycache): better handling of failed requests#989
Elia-Renzoni wants to merge 5 commits into
trickstercache:mainfrom
Elia-Renzoni:failed-upstream-req

Conversation

@Elia-Renzoni
Copy link
Copy Markdown
Contributor

@Elia-Renzoni Elia-Renzoni commented Apr 30, 2026

Description

This update improves how DeltaProxyCache handles and reports failures when serving partial hit requests that involve multiple upstream extents. With this change failed extents are explicitly tracked during the fetchExtents() process using a dedicated parallel structure,rather than aggregating errors. This allows the system to preserve detailed information about which specific extents failed.

fetchExtents() now returns (ExtentList, bool) to indicate failed extents and whether a full failure occurred. A full failure (or severe fault) occurs when all the requests fail, in this case the severeFault flag is set to true, otherwise it is false.

Related Issue: #291

Type of Change

    • Bug fix
    • New feature
    • Optimization
    • Test coverage
    • Documentation
    • Infrastructure

AI Disclosure

    • This contribution DOES NOT include AI-generated changes
    • This contribution DOES include AI-generated changes, and I have reviewed the relevant contributing guidelines. (the unit test is AI generated)

@Elia-Renzoni Elia-Renzoni requested a review from a team as a code owner April 30, 2026 10:05
Comment thread pkg/proxy/engines/deltaproxycache.go Outdated
mts[i] = nts
} else if resp.StatusCode != http.StatusOK {
errs[i] = tpe.ErrUnexpectedUpstreamResponse
errTs[i] = el[i]
Copy link
Copy Markdown
Contributor

@crandles crandles May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when are we using these error values? (it doesn't look like we are, is that intended? would expect we log these somewhere). previously we were returning errors.Join(errs...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the use of errors in favor of returning the failed extents in the headers. The errors are still being logged, in particular UnexpectedUpstreamResponse and the unmarshaling errors. The only case that is not currently logged is the error coming from the Fetch function call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error from the Fetch function is also logged within the function itself, e.g.:

if err != nil {
		logger.Error("error reading body from http response",
			logging.Pairs{"url": pr.URL.String(), "detail": err.Error()})
		return body, resp, 0, err
}

Comment thread pkg/proxy/headers/result_header.go Outdated
// TestDeltaProxyCacheRequestPartialHitWithFailedExtents verifies that when
// a partial hit occurs and the upstream request for the missing fragment fails,
// the failed extents are properly tracked and reported in the response header.
func TestDeltaProxyCacheRequestPartialHitWithFailedExtents(t *testing.T) {
Copy link
Copy Markdown
Contributor

@crandles crandles May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need more tests cases

  • multi-extent partial failure (some succeed, some fail)
  • header round-trip parse test for failed=
  • MergeResultHeaderVals with a failed= header on either side
  • sharded fetchTimeseries with a single shard failing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are some failing / panic'ing tests 141a374

Signed-off-by: Elia Renzoni <elia.renzoni03@gmail.com>
Signed-off-by: Elia Renzoni <elia.renzoni03@gmail.com>
Signed-off-by: Elia Renzoni <elia.renzoni03@gmail.com>
@Elia-Renzoni Elia-Renzoni force-pushed the failed-upstream-req branch from 2e17b90 to 9065dd2 Compare May 5, 2026 06:22
Signed-off-by: Elia Renzoni <elia.renzoni03@gmail.com>
Signed-off-by: Elia Renzoni <elia.renzoni03@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants