Python generators cannot be resumed after an unhandled exception, the frame is finalized and all subsequent next() calls raise StopIteration.
After #597 spooled segment downloads happen lazily inside the generator during iteration. If a transient error (e.g. S3 timeout) occurs mid-iteration, the caller sees the exception, but the generator is permanently dead.
Subsequent fetchone() calls also return None as if the query completed normally and silently drops remaining rows.
Before #597 this was theoretical issue since fetch() materialized all segments eagerly. I/O errors happened outside the generator.
From looking at other projects it seems the pattern is to use a class with __next__ and __iter__ instead of a generator and having a field on the class to hold state of the iterator so that we can signal to the caller what happened.
Here's a minimal repro:
from trino.client import TrinoResult
class FailingIterator:
def __init__(self):
self._count = 0
def __iter__(self):
return self
def __next__(self):
if self._count >= 3:
raise IOError("S3 segment download failed")
self._count += 1
return [self._count]
class FakeQuery:
finished = True
def fetch(self):
return []
result = TrinoResult(FakeQuery(), FailingIterator())
it = iter(result)
next(it) # [1]
next(it) # [2]
next(it) # [3]
next(it) # raises IOError, caller sees this and retries
next(it) # raises StopIteration, caller gets None from fetchone() and thinks query is done
Python generators cannot be resumed after an unhandled exception, the frame is finalized and all subsequent next() calls raise
StopIteration.After #597 spooled segment downloads happen lazily inside the generator during iteration. If a transient error (e.g. S3 timeout) occurs mid-iteration, the caller sees the exception, but the generator is permanently dead.
Subsequent fetchone() calls also return None as if the query completed normally and silently drops remaining rows.
Before #597 this was theoretical issue since fetch() materialized all segments eagerly. I/O errors happened outside the generator.
From looking at other projects it seems the pattern is to use a class with
__next__and__iter__instead of a generator and having a field on the class to hold state of the iterator so that we can signal to the caller what happened.Here's a minimal repro: