Fix: Enforce fail-fast output_dir validation with argparse and OS W_O…#2753
Open
gagandhakrey wants to merge 1 commit into
Open
Fix: Enforce fail-fast output_dir validation with argparse and OS W_O…#2753gagandhakrey wants to merge 1 commit into
gagandhakrey wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Currently, Whisper does not eagerly validate if the destination --output_dir provided by the user is valid, accessible, or capable of being created. Because directory creation (os.makedirs) and file writing operations execute lazily deep within the pipeline, passing an invalid destination causes the application to:
Load multiple gigabytes of models into memory unnecessarily.
In the case of API consumers, successfully finish an entire 10+ minute ML transcription process only to immediately crash and lose all generated data with a massive Python stack trace (FileNotFoundError or PermissionError) right as it attempts to save the subtitle files.
Solution
This PR strictly enforces "fail-fast" behavior for output configuration to protect against wasted compute time and poor CLI UX.
Key Changes:
(CLI) Native Argparse Validation: Created a valid_output_dir type boundary inside whisper/transcribe.py.
argparse now safely intercepts the os.makedirs() step during argument parsing.
If creation fails or write-permissions are missing, the CLI exits instantly before touching torch, providing a clean POSIX-standard whisper: error: argument --output_dir/-o output rather than an unhandled traceback.
(API) Robust Access Bitmasking: Updated whisper/utils.py:ResultWriter to validate upon instantiation. It strictly validates both os.W_OK | os.X_OK, which guarantees the running process genuinely has traversing and writing permissions on Unix/Linux file systems before consumers initiate heavy tasks.