Summary
In components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py, the _ensure_dataset_exists() call inside _schedule_job() is not wrapped in a try/except block. If it throws after paths_to_compress_buffer.flush(), the function exits before _batch_and_submit_tasks() runs, leaving the job stuck in PENDING status and causing all remaining pending jobs in the current scheduling cycle to be skipped.
Fix
Wrap _ensure_dataset_exists in a try/except that:
- Catches any exception.
- Logs it.
- Marks the current job as
FAILED via update_compression_job_metadata.
- Returns early so the scheduler continues processing remaining jobs.
Optionally, move the dataset existence check before paths_to_compress_buffer.flush() to avoid buffering work that may be dropped.
References
Summary
In
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py, the_ensure_dataset_exists()call inside_schedule_job()is not wrapped in a try/except block. If it throws afterpaths_to_compress_buffer.flush(), the function exits before_batch_and_submit_tasks()runs, leaving the job stuck inPENDINGstatus and causing all remaining pending jobs in the current scheduling cycle to be skipped.Fix
Wrap
_ensure_dataset_existsin a try/except that:FAILEDviaupdate_compression_job_metadata.Optionally, move the dataset existence check before
paths_to_compress_buffer.flush()to avoid buffering work that may be dropped.References
clp_configcolumn truncation and handle corrupted jobs gracefully (fixes 2151). #2178clp_configcolumn truncation and handle corrupted jobs gracefully (fixes 2151). #2178 (comment)