Skip to content

fix(compression-scheduler): Wrap _ensure_dataset_exists in per-job failure path in _schedule_job #2201

@coderabbitai

Description

@coderabbitai

Summary

In components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py, the _ensure_dataset_exists() call inside _schedule_job() is not wrapped in a try/except block. If it throws after paths_to_compress_buffer.flush(), the function exits before _batch_and_submit_tasks() runs, leaving the job stuck in PENDING status and causing all remaining pending jobs in the current scheduling cycle to be skipped.

Fix

Wrap _ensure_dataset_exists in a try/except that:

  1. Catches any exception.
  2. Logs it.
  3. Marks the current job as FAILED via update_compression_job_metadata.
  4. Returns early so the scheduler continues processing remaining jobs.

Optionally, move the dataset existence check before paths_to_compress_buffer.flush() to avoid buffering work that may be dropped.

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions