Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds server-side audio transcription: new audio config and TOML mapping, ffmpeg in the runtime image, OpenAI Whisper transcription client calls, ffprobe-based audio validation and AudioProcessor integration into file uploads, router DB-session scoping for uploads, docs updates, and extensive tests for audio flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Router as messages.py (create_messages_with_file)
participant FileService as FileProcessingService
participant AudioProc as AudioProcessor
participant FFProbe as ffprobe (system)
participant OpenAI as OpenAI API
participant DB
Client->>Router: POST /messages with file
Router->>FileService: extract_text_from_upload(file)
FileService->>AudioProc: supports_file_type?/route decision
alt audio detected
AudioProc->>FFProbe: probe/validate audio (via ffprobe)
FFProbe-->>AudioProc: duration/metadata
AudioProc->>OpenAI: audio.transcriptions.create(model=whisper-1, response_format=text)
OpenAI-->>AudioProc: transcription text
AudioProc-->>FileService: ExtractedFileText (text + metadata)
else non-audio
FileService-->>Router: ExtractedFileText (from other processors)
end
Router->>Router: acquire db session via tracked_db()
Router->>DB: create_messages(...) and commit
DB-->>Router: created messages
Router-->>Client: 201 Created (messages + metadata)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
aded354 to
2776d1d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/utils/clients.py (1)
320-357: Add Google-style docstrings to the new transcription helpers.Both new functions are part of core utility behavior and should include concise Google-style docstrings for inputs/returns/exceptions.
As per coding guidelines "Use Google style docstrings".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/clients.py` around lines 320 - 357, Add Google-style docstrings to _transcribe_audio_once and transcribe_audio: for each function include a short summary, Args listing content (bytes), filename (str), content_type (str), and model (str | None where applicable), the Returns section describing the returned transcription (str), and a Raises section documenting LLMError for unsupported content types and missing 'openai' client; keep docstrings concise and follow Google style (triple-quoted, parameters with types and descriptions, and brief return/raises lines) so callers and linters get proper documentation.tests/routes/test_files.py (1)
685-717: Consider using pytest fixtures for settings mutation.The manual save/restore pattern for
settings.MAX_FILE_SIZEandsettings.AUDIO.MAX_FILE_SIZE_BYTESworks but is verbose and repeated across multiple tests. A pytest fixture withyieldwould be cleaner and more maintainable.♻️ Optional: Extract settings override to a fixture
`@pytest.fixture` def override_file_size_limits(): """Fixture to temporarily override file size limits.""" original_generic = settings.MAX_FILE_SIZE original_audio = settings.AUDIO.MAX_FILE_SIZE_BYTES def _override(generic: int, audio: int): settings.MAX_FILE_SIZE = generic settings.AUDIO.MAX_FILE_SIZE_BYTES = audio yield _override settings.MAX_FILE_SIZE = original_generic settings.AUDIO.MAX_FILE_SIZE_BYTES = original_audioThen use in tests:
async def test_audio_upload_...(override_file_size_limits, ...): override_file_size_limits(generic=5, audio=10) # ... test code without try/finally🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routes/test_files.py` around lines 685 - 717, Create a pytest fixture (e.g., override_file_size_limits) that saves settings.MAX_FILE_SIZE and settings.AUDIO.MAX_FILE_SIZE_BYTES, yields a helper to set new values, and restores the originals after yield; then replace the try/finally blocks in tests that mutate settings (the code that sets settings.MAX_FILE_SIZE = 5 and settings.AUDIO.MAX_FILE_SIZE_BYTES = 10 around the upload test using ExtractedFileText and client.post) by adding the fixture as a parameter and calling override_file_size_limits(generic=5, audio=10) at the start of the test so the mutation/restore is handled by the fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/v3/documentation/features/advanced/file-uploads.mdx`:
- Line 106: Update the "Audio Files" description to remove any mention of
internal audio chunking/splitting and merging; state that uploads are
transcribed directly by OpenAI (using `whisper-1` by default) and then the
resulting transcript is subject to message chunking, e.g., replace the sentence
in the "Audio Files" block that references splitting into multiple audio
segments and merging with text that describes direct transcription followed by
message chunking of the transcript.
In `@src/utils/files.py`:
- Around line 250-269: Temporary file created by tempfile.NamedTemporaryFile
(temp_path) can leak if temp_file.write(chunk) raises; wrap the entire temp file
lifecycle in a try/finally so temp_path is always unlinked and temp_file closed
on any error. Concretely, create temp_file/temp_path, then immediately enter a
try block that contains the read/write loop (and the await file.seek(0) in its
own finally), then perform the probe_audio_duration_seconds_from_path call and
ValidationException handling in an outer try/except, and put
temp_path.unlink(missing_ok=True) (and ensure temp_file.close() if needed) in
the outermost finally so the temp file is removed even when writes fail. Ensure
references: tempfile.NamedTemporaryFile, temp_path, file.read,
processor.probe_audio_duration_seconds_from_path, ValidationException are
handled as described.
---
Nitpick comments:
In `@src/utils/clients.py`:
- Around line 320-357: Add Google-style docstrings to _transcribe_audio_once and
transcribe_audio: for each function include a short summary, Args listing
content (bytes), filename (str), content_type (str), and model (str | None where
applicable), the Returns section describing the returned transcription (str),
and a Raises section documenting LLMError for unsupported content types and
missing 'openai' client; keep docstrings concise and follow Google style
(triple-quoted, parameters with types and descriptions, and brief return/raises
lines) so callers and linters get proper documentation.
In `@tests/routes/test_files.py`:
- Around line 685-717: Create a pytest fixture (e.g., override_file_size_limits)
that saves settings.MAX_FILE_SIZE and settings.AUDIO.MAX_FILE_SIZE_BYTES, yields
a helper to set new values, and restores the originals after yield; then replace
the try/finally blocks in tests that mutate settings (the code that sets
settings.MAX_FILE_SIZE = 5 and settings.AUDIO.MAX_FILE_SIZE_BYTES = 10 around
the upload test using ExtractedFileText and client.post) by adding the fixture
as a parameter and calling override_file_size_limits(generic=5, audio=10) at the
start of the test so the mutation/restore is handled by the fixture.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 30669f7d-ef47-4c09-bcd3-a20ebdef8191
📒 Files selected for processing (14)
.env.templateDockerfileconfig.toml.exampledocs/v3/documentation/features/advanced/file-uploads.mdxsrc/config.pysrc/deriver/enqueue.pysrc/models.pysrc/routers/messages.pysrc/utils/clients.pysrc/utils/files.pytests/conftest.pytests/routes/test_files.pytests/sdk_typescript/conftest.pytests/utils/test_audio_processing.py
💤 Files with no reviewable changes (2)
- src/deriver/enqueue.py
- src/models.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/utils/files.py (1)
250-269:⚠️ Potential issue | 🟡 MinorTemp file leak on write failure still present.
The structure has a gap: if
temp_file.write(chunk)at line 254 raises an exception, the innerfinally(line 255-256) runs and thewithblock exits, but then the exception propagates without entering the outertryblock at line 258. This meanstemp_path.unlink(missing_ok=True)at line 269 never executes, leaking the temp file.The fix from the past review comment should be applied to wrap the entire temp file lifecycle in a single try/finally.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/files.py` around lines 250 - 269, The temp file can leak if temp_file.write(chunk) raises because unlink is only in the outer try/finally; wrap the entire temp-file lifecycle (creation, chunked writes, and probe call) in a single try/finally so temp_path.unlink(missing_ok=True) always runs; specifically, keep the NamedTemporaryFile context and the read loop using file.read(UPLOAD_VALIDATION_CHUNK_BYTES) and ensure file.seek(0) still happens before probing, then call processor.probe_audio_duration_seconds_from_path(temp_path) inside that protected block and handle ValidationException as before (checking for "Uploaded audio is invalid or unreadable") while re-raising other exceptions.
🧹 Nitpick comments (1)
src/routers/messages.py (1)
154-162: Convoluted conditional logic is hard to follow.The compound condition at lines 154-161 checks 6 conditions with
and. Consider extracting to a helper function or restructuring for clarity:♻️ Suggested refactor
# Validate file size - max_file_size = settings.MAX_FILE_SIZE - if ( - file.size - and file.size > settings.MAX_FILE_SIZE - and is_audio_transcription_enabled() - and is_audio_upload(file) - and file.size <= settings.AUDIO.MAX_FILE_SIZE_BYTES - and await is_validated_audio_upload(file) - ): - max_file_size = settings.AUDIO.MAX_FILE_SIZE_BYTES + max_file_size = settings.MAX_FILE_SIZE + + # Allow larger uploads for valid audio files when transcription is enabled + if file.size and file.size > settings.MAX_FILE_SIZE: + if ( + is_audio_transcription_enabled() + and is_audio_upload(file) + and file.size <= settings.AUDIO.MAX_FILE_SIZE_BYTES + and await is_validated_audio_upload(file) + ): + max_file_size = settings.AUDIO.MAX_FILE_SIZE_BYTES🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/messages.py` around lines 154 - 162, The compound conditional that decides to set max_file_size is hard to read; extract the logic into a descriptive helper function (e.g., is_valid_audio_for_transcription(file) or should_use_audio_max_size(file)) and replace the multi-AND expression with a single call. The helper should check file.size truthiness, file.size > settings.MAX_FILE_SIZE, is_audio_transcription_enabled(), is_audio_upload(file), file.size <= settings.AUDIO.MAX_FILE_SIZE_BYTES, and await is_validated_audio_upload(file) (keep the async validation if needed), then return True/False so you can simply do "if await should_use_audio_max_size(file): max_file_size = settings.AUDIO.MAX_FILE_SIZE_BYTES". This makes the intent around file.size, settings.MAX_FILE_SIZE, settings.AUDIO.MAX_FILE_SIZE_BYTES, is_audio_transcription_enabled, is_audio_upload, and is_validated_audio_upload explicit and easier to test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/utils/files.py`:
- Around line 250-269: The temp file can leak if temp_file.write(chunk) raises
because unlink is only in the outer try/finally; wrap the entire temp-file
lifecycle (creation, chunked writes, and probe call) in a single try/finally so
temp_path.unlink(missing_ok=True) always runs; specifically, keep the
NamedTemporaryFile context and the read loop using
file.read(UPLOAD_VALIDATION_CHUNK_BYTES) and ensure file.seek(0) still happens
before probing, then call
processor.probe_audio_duration_seconds_from_path(temp_path) inside that
protected block and handle ValidationException as before (checking for "Uploaded
audio is invalid or unreadable") while re-raising other exceptions.
---
Nitpick comments:
In `@src/routers/messages.py`:
- Around line 154-162: The compound conditional that decides to set
max_file_size is hard to read; extract the logic into a descriptive helper
function (e.g., is_valid_audio_for_transcription(file) or
should_use_audio_max_size(file)) and replace the multi-AND expression with a
single call. The helper should check file.size truthiness, file.size >
settings.MAX_FILE_SIZE, is_audio_transcription_enabled(), is_audio_upload(file),
file.size <= settings.AUDIO.MAX_FILE_SIZE_BYTES, and await
is_validated_audio_upload(file) (keep the async validation if needed), then
return True/False so you can simply do "if await
should_use_audio_max_size(file): max_file_size =
settings.AUDIO.MAX_FILE_SIZE_BYTES". This makes the intent around file.size,
settings.MAX_FILE_SIZE, settings.AUDIO.MAX_FILE_SIZE_BYTES,
is_audio_transcription_enabled, is_audio_upload, and is_validated_audio_upload
explicit and easier to test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: be9116eb-6410-495f-88b4-e4ee087f3c90
📒 Files selected for processing (14)
.env.templateDockerfileconfig.toml.exampledocs/v3/documentation/features/advanced/file-uploads.mdxsrc/config.pysrc/deriver/enqueue.pysrc/models.pysrc/routers/messages.pysrc/utils/clients.pysrc/utils/files.pytests/conftest.pytests/routes/test_files.pytests/sdk_typescript/conftest.pytests/utils/test_audio_processing.py
💤 Files with no reviewable changes (2)
- src/deriver/enqueue.py
- src/models.py
✅ Files skipped from review due to trivial changes (5)
- Dockerfile
- config.toml.example
- .env.template
- tests/utils/test_audio_processing.py
- tests/routes/test_files.py
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/sdk_typescript/conftest.py
- docs/v3/documentation/features/advanced/file-uploads.mdx
- tests/conftest.py
- src/config.py
- src/utils/clients.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/utils/test_audio_processing.py (1)
314-329: Consider usingtempfile.gettempdir()for portability.The hardcoded
/tmppath works sincePath.unlinkis mocked andwrite()raises immediately. However, usingtempfile.gettempdir()would be slightly more portable and silence the S108 warning.🧹 Optional: Use tempfile.gettempdir()
+import tempfile as _tempfile + class FailingTempFile: - name: str = "/tmp/test-audio-validation.mp3" + name: str = f"{_tempfile.gettempdir()}/test-audio-validation.mp3"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/utils/test_audio_processing.py` around lines 314 - 329, The test's FailingTempFile currently hardcodes name = "/tmp/test-audio-validation.mp3"; change this to build the temp path using tempfile.gettempdir() (e.g., join tempfile.gettempdir() with the filename) so the FailingTempFile.name is portable; update the FailingTempFile class definition (the name attribute) and any tests that reference it to use the tempfile.gettempdir()-based path instead of the hardcoded "/tmp".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/files.py`:
- Around line 193-201: In _probe_audio_duration_seconds ensure the temp file is
always cleaned up if write fails by assigning temp_path = Path(temp_file.name)
immediately after creating the NamedTemporaryFile (before writing) or by using
delete=True and explicitly closing the file before probing; then wrap the write
and probe in try/finally and unlink temp_path only if it exists (or check
temp_path.exists()) so a raised exception during temp_file.write(content) will
still remove the on-disk temp file; reference the _probe_audio_duration_seconds
and probe_audio_duration_seconds_from_path symbols when applying the change.
---
Nitpick comments:
In `@tests/utils/test_audio_processing.py`:
- Around line 314-329: The test's FailingTempFile currently hardcodes name =
"/tmp/test-audio-validation.mp3"; change this to build the temp path using
tempfile.gettempdir() (e.g., join tempfile.gettempdir() with the filename) so
the FailingTempFile.name is portable; update the FailingTempFile class
definition (the name attribute) and any tests that reference it to use the
tempfile.gettempdir()-based path instead of the hardcoded "/tmp".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9d72af7e-316d-42e7-8437-6ff1006c8832
📒 Files selected for processing (4)
docs/v3/documentation/features/advanced/file-uploads.mdxsrc/utils/files.pytests/routes/test_files.pytests/utils/test_audio_processing.py
✅ Files skipped from review due to trivial changes (1)
- docs/v3/documentation/features/advanced/file-uploads.mdx
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/routes/test_files.py
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/utils/files.py (1)
263-266: Avoid branching on exception text.Using
str(exc) == "Uploaded audio is invalid or unreadable"makes upload behavior depend on a message string. Any wording change inprobe_audio_duration_seconds_from_path()silently changes control flow. Prefer a dedicated exception type or an explicit flag/code instead.As per coding guidelines, "Use custom exceptions defined in src/exceptions.py with specific exception types (ResourceNotFoundException, ValidationException, etc.)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/files.py` around lines 263 - 266, The code branches on the text of ValidationException; instead define and use a specific exception type (e.g., InvalidAudioException) in src/exceptions.py, have probe_audio_duration_seconds_from_path() raise that specific exception when audio is invalid/unreadable, then change the except block in src/utils/files.py from "except ValidationException as exc: if str(exc) == ..." to "except InvalidAudioException:" returning False and let other ValidationException instances continue to propagate; import the new InvalidAudioException where needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/files.py`:
- Around line 216-228: In is_validated_audio_upload, add a timeout=10 to the
subprocess.run(...) call that invokes ffprobe and catch
subprocess.TimeoutExpired; when a timeout occurs raise a ValidationException
(same style as the other except blocks) with a message like "Uploaded audio is
invalid or unreadable" (or a clear timeout-specific message) and chain the
original exception (from exc). This ensures the ffprobe call won't block
indefinitely and mirrors the existing FileNotFoundError and
CalledProcessError/ValueError handling.
- Around line 159-163: Wrap the transcribe_audio(...) call in a try/except that
catches OpenAI provider exceptions (e.g., openai.error.APIError,
openai.error.RateLimitError, openai.error.AuthenticationError,
openai.error.APIConnectionError — or the generic openai.error.OpenAIError) and
translate them into the project-specific FileProcessingError; inside the except,
raise FileProcessingError (from src/exceptions.py) with a clear, descriptive
message that includes context (e.g., filename/normalized_filename and
content_type) and the original exception message for debugging while preventing
raw provider errors from propagating out of the upload flow.
---
Nitpick comments:
In `@src/utils/files.py`:
- Around line 263-266: The code branches on the text of ValidationException;
instead define and use a specific exception type (e.g., InvalidAudioException)
in src/exceptions.py, have probe_audio_duration_seconds_from_path() raise that
specific exception when audio is invalid/unreadable, then change the except
block in src/utils/files.py from "except ValidationException as exc: if str(exc)
== ..." to "except InvalidAudioException:" returning False and let other
ValidationException instances continue to propagate; import the new
InvalidAudioException where needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0ee8eec3-b8c5-4d94-afdf-fcf6f79f6900
📒 Files selected for processing (2)
src/utils/files.pytests/utils/test_audio_processing.py
Summary by CodeRabbit
New Features
Bug Fixes / Behavior
Documentation
Chores
Tests