Skip to content

feat(api-server): Add GET /usage/compression endpoint.#2197

Open
jackluo923 wants to merge 32 commits intoy-scope:mainfrom
jackluo923:feat/usage-api
Open

feat(api-server): Add GET /usage/compression endpoint.#2197
jackluo923 wants to merge 32 commits intoy-scope:mainfrom
jackluo923:feat/usage-api

Conversation

@jackluo923
Copy link
Copy Markdown
Member

@jackluo923 jackluo923 commented Apr 11, 2026

Description

Adds a GET /usage/compression endpoint to the api-server that returns per-job compression usage data (sizes, durations, task stats) within a configurable epoch-millisecond time window. Intended for reporting, dashboarding, and chargeback use cases.

Query parameters:

  • begin_timestamp (required) — start of usage window, epoch ms inclusive
  • end_timestamp (required) — end of usage window, epoch ms inclusive
  • job_status (optional) — comma-separated, case-insensitive list of statuses (succeeded, FAILED, etc.). Defaults to all terminal states (SUCCEEDED,FAILED,KILLED).
  • limit (optional) — maximum number of jobs to return. Defaults to 1000.

Key changes:

  • CompressionUsageParams and validation live in client.rs (not the HTTP layer) so any future caller of Client gets the same validated interface
  • Job status parsing uses strum::EnumString on CompressionJobStatus (case-insensitive, Pending excluded via #[strum(disabled)])
  • get_compression_usage accepts &[CompressionJobStatus]; empty slice means "no status filter" (IN clause omitted)
  • sqlx::FromRow derive on CompressionUsage eliminates manual try_get mapping; query_as replaces manual row iteration
  • Epoch-millisecond timestamps (not ISO strings); f64 with ROUND() in SQL instead of BigDecimal
  • Composite index on compression_jobs(start_time, status) for efficient time-range queries
  • 15 unit + route tests covering validation, error paths, and edge cases

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • task lint:check-rust (fmt + clippy) passes
  • task codegen:openapi — generated OpenAPI JSON committed
  • cargo test -p api-server — 15/15 tests pass

Summary by CodeRabbit

  • New Features

    • Added GET /usage/compression endpoint returning compression usage over a date range with job-status filtering (comma-separated, case-insensitive), defaults, limits and OpenAPI docs.
  • Bug Fixes

    • Invalid query parameters now yield 400 Bad Request with clearer validation (timestamps, statuses, limits); client invalid-inputs mapped to 400.
  • Chores

    • Backfilled composite DB index to speed compression queries; updated dependencies to support case-insensitive status parsing and tooling.
  • Tests

    • Added unit and handler tests covering validation and endpoint behaviour.

@jackluo923 jackluo923 requested a review from a team as a code owner April 11, 2026 08:08
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a GET /usage/compression endpoint with query parsing/validation, new request/response types and validation, a Client method to run an aggregate compression-usage SQL query, a ClientError::InvalidInput variant, case-insensitive CompressionJobStatus parsing, dependency updates, and a composite DB index on (start_time, status).

Changes

Cohort / File(s) Summary
API server dependencies
components/api-server/Cargo.toml
Added chrono (with serde) and extended sqlx features to include chrono; added dev-deps http-body-util and tower/util.
Client logic & types
components/api-server/src/client.rs
Added CompressionUsageParams, ValidatedCompressionUsageParams (TryFrom validation), CompressionUsage, DEFAULT_JOB_STATUSES, re-exported CompressionJobStatus, Client::get_compression_usage() with dynamic IN (...) filter and aggregate SQLx query; added unit tests for validation.
Routing, handler & docs
components/api-server/src/routes.rs
Registered GET /usage/compression and compression_usage handler; validates params, logs, calls client, returns JSON; updated OpenAPI to include new path and schemas; maps ClientError::InvalidInput → 400; added handler tests.
Error types
components/api-server/src/error.rs
Added ClientError::InvalidInput(String) variant with formatted message.
Job status enum & utils
components/clp-rust-utils/Cargo.toml, components/clp-rust-utils/src/job_config/compression.rs
Enabled strum derive feature; CompressionJobStatus now derives EnumString (ascii_case_insensitive), Copy, and utoipa::ToSchema for parsing and schema generation.
DB initialization script
components/clp-py-utils/initialize-orchestration-db.py
Added composite BTREE index JOB_START_TIME_STATUS(start_time, status) to CREATE TABLE and an idempotent ALTER TABLE ... ADD INDEX with duplicate-index error suppression (ER_DUP_KEYNAME).

Sequence Diagram

sequenceDiagram
    actor Client
    participant Handler as HTTP Handler<br/>(routes.rs)
    participant Validator as Params Validator<br/>(client.rs)
    participant APIClient as DB Client Method<br/>(client.rs)
    participant DB as SQL Database

    Client->>Handler: GET /usage/compression?begin=&end=&job_status=&limit=
    Handler->>Validator: Parse to CompressionUsageParams
    Validator->>Validator: Validate timestamps, limit, parse job_status tokens
    alt Validation Fails
        Validator-->>Handler: ClientError::InvalidInput
        Handler-->>Client: 400 Bad Request
    else Validation Passes
        Handler->>APIClient: get_compression_usage(validated_params)
        APIClient->>DB: Execute aggregate query with dynamic IN (...) and time bounds
        DB-->>APIClient: Vec<CompressionUsage>
        APIClient-->>Handler: Query results
        Handler-->>Client: 200 OK + JSON
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a GET /usage/compression endpoint to the API server.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/routes.rs`:
- Around line 271-298: The parsing currently allows duplicate statuses which
leads to redundant SQL placeholders; update validate_compression_usage_params to
deduplicate the resolved statuses before returning (e.g., after mapping with
resolve_job_status in validate_compression_usage_params, filter out repeats
while preserving order by tracking seen statuses in a local HashSet or Vec),
ensuring DEFAULT_JOB_STATUSES remains the fallback and that the existing
empty-check and error paths (and references to resolve_job_status and
DEFAULT_JOB_STATUSES) are preserved.

In `@components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py`:
- Around line 81-82: The composite index `JOB_START_TIME_STATUS` on (start_time,
status) is correct for get_compression_usage's query pattern but CREATE TABLE IF
NOT EXISTS only adds it for new DBs; add a migration step in
initialize-orchestration-db.py to ensure existing deployments get the index:
detect if the table exists and, if so, run an ALTER TABLE to add INDEX
`JOB_START_TIME_STATUS` (`start_time`, `status`) USING BTREE (guarded to avoid
duplicate-creation errors), or add an explicit migration script that executes
that ALTER TABLE; reference the table creation block and the index name
`JOB_START_TIME_STATUS` when implementing this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f41ac76a-326c-4e97-af35-71e7afa2beef

📥 Commits

Reviewing files that changed from the base of the PR and between 9ae078e and d5a0f8f.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • components/api-server/Cargo.toml
  • components/api-server/src/client.rs
  • components/api-server/src/routes.rs
  • components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py

Returns per-job compression usage (sizes, duration, task stats) within
a configurable epoch-millisecond time window, filtered by job status.

- Comma-separated `job_status` query param (e.g. SUCCEEDED,FAILED,KILLED)
- Defaults to all terminal states (SUCCEEDED, FAILED, KILLED)
- Parameter validation extracted into shared helper for reuse in tests
- Composite index on (start_time, status) for efficient time-range queries
- OpenAPI schema registration for CompressionUsageRow
- Unit tests for validation logic (timestamp ordering, status resolution,
  edge cases like trailing commas and whitespace)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py (1)

81-82: ⚠️ Potential issue | 🟠 Major

Backfill this index for upgraded databases.

CREATE TABLE IF NOT EXISTS only applies this index on fresh installs. Existing compression_jobs tables keep the old definition, so /usage/compression can still miss JOB_START_TIME_STATUS and fall back to much slower scans until a separate migration adds it. Please add a guarded ALTER TABLE ... ADD INDEX path or an explicit migration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py` around
lines 81 - 82, The new index JOB_START_TIME_STATUS on compression_jobs is only
created for fresh installs via CREATE TABLE IF NOT EXISTS, so upgraded DBs can
miss it; add a guarded ALTER TABLE path or migration that ensures the index
exists: detect whether compression_jobs has an index named JOB_START_TIME_STATUS
(query information_schema.STATISTICS or use a safe "ALTER TABLE compression_jobs
ADD INDEX JOB_START_TIME_STATUS (start_time, status)" wrapped in an existence
check) and create it if missing, placing this logic alongside the existing
CREATE TABLE IF NOT EXISTS code (the same initialization routine that references
compression_jobs) so upgrades and fresh installs both end up with the index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/client.rs`:
- Around line 636-640: The SQL currently uses an inner JOIN between
compression_jobs (alias j) and compression_tasks (alias t) which drops jobs that
have no task rows; change the join to a LEFT JOIN (i.e., replace "JOIN
compression_tasks t ON t.job_id = j.id" with "LEFT JOIN compression_tasks t ON
t.job_id = j.id") so jobs with zero tasks or no persisted tasks are retained;
keep the existing SUM(t.duration) aggregation as-is (it will yield NULL for
no-task jobs) so it continues to map to the Option<BigDecimal> return type in
client.rs.

In `@components/api-server/src/routes.rs`:
- Around line 252-265: The endpoint allows unbounded time windows which can
trigger huge materialization in Client::get_compression_usage (which uses
fetch_all); add validation in the request path (e.g., in CompressionUsageParams
deserialization/validation or at the start of the compression_usage handler) to
reject windows larger than a safe maximum (e.g., max_window_ms constant) or
require/implement pagination (limit + cursor) and return a 400 with a clear
message when the window exceeds the limit; update the handler to enforce this
check before forwarding to Client::get_compression_usage so expensive fetch_all
calls cannot be driven by a single request.

---

Duplicate comments:
In `@components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py`:
- Around line 81-82: The new index JOB_START_TIME_STATUS on compression_jobs is
only created for fresh installs via CREATE TABLE IF NOT EXISTS, so upgraded DBs
can miss it; add a guarded ALTER TABLE path or migration that ensures the index
exists: detect whether compression_jobs has an index named JOB_START_TIME_STATUS
(query information_schema.STATISTICS or use a safe "ALTER TABLE compression_jobs
ADD INDEX JOB_START_TIME_STATUS (start_time, status)" wrapped in an existence
check) and create it if missing, placing this logic alongside the existing
CREATE TABLE IF NOT EXISTS code (the same initialization routine that references
compression_jobs) so upgrades and fresh installs both end up with the index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 93573497-43a1-4b75-b6a9-07782e7cb234

📥 Commits

Reviewing files that changed from the base of the PR and between d5a0f8f and 1685f8a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • components/api-server/Cargo.toml
  • components/api-server/src/client.rs
  • components/api-server/src/routes.rs
  • components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py

Comment on lines +652 to +654
rows.iter()
.map(|r| {
Ok(CompressionUsageRow {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using try_get to map MySqlQueryResult to CompressionUsageRow, it might be easier to derive sqlx::FromRow in CompressionUsageRow, and use sqlx::query_as::<_, CompressionUsageRow>("SELECT ...") directly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice suggestion, switch to sqlx::FromRow


/// Validates compression usage parameters and resolves the requested job
/// statuses into their integer codes.
fn validate_compression_usage_params(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The conventional way to validate a serde struct is to create an UncheckedCompressionUsageQuery and impl TryFrom<UncheckedCompressionUsageQuery> for CompressionUsageQuery, but this will also show UncheckedCompressionUsageQuery as the schema name in the openapi docs https://petstore.swagger.io/?url=https://docs.yscope.com/clp/main/_static/generated/api-server-openapi.json#/default/health.

How about move this validate function as a method of CompressionUsageParams, and add a comment to explain why

impl CompressionUsageParams {
    // explain why not having an UncheckedCompressionUsage
    pub fn validate() -> Result<(), ClientError> {
         ....
    }
}
  1. If you adopt CompressionJobStatus, then the validation of job status can be removed / simplified to only checking empty.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already adopted CompressionJobStatus

jackluo923 and others added 2 commits April 11, 2026 11:09
Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
components/api-server/src/client.rs (1)

51-79: 🧹 Nitpick | 🔵 Trivial

Consider appending 'Z' to the ISO 8601 date format to indicate UTC.

The doc comments correctly state times are in UTC, but the actual DATE_FORMAT in the SQL query (lines 630-631) produces '%Y-%m-%dT%H:%i:%s' which lacks the 'Z' suffix. API consumers may assume local time without an explicit timezone indicator. Since the struct documents these as UTC, the format should reflect that.

Suggested SQL format change
- DATE_FORMAT(j.creation_time, '%Y-%m-%dT%H:%i:%s') AS creation_time, \
- DATE_FORMAT(j.start_time, '%Y-%m-%dT%H:%i:%s') AS start_time, \
+ CONCAT(DATE_FORMAT(j.creation_time, '%Y-%m-%dT%H:%i:%s'), 'Z') AS creation_time, \
+ CONCAT(DATE_FORMAT(j.start_time, '%Y-%m-%dT%H:%i:%s'), 'Z') AS start_time, \

Alternatively, return epoch milliseconds as discussed in the past review comment, which avoids timezone ambiguity entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/api-server/src/client.rs` around lines 51 - 79, The struct
CompressionUsage documents creation_time and start_time as ISO 8601 UTC but the
SQL DATE_FORMAT used to populate those fields omits the trailing 'Z'; update the
query that builds creation_time and start_time so the string explicitly
indicates UTC (e.g., append 'Z' to the DATE_FORMAT output) or switch to
returning epoch milliseconds instead (so CompressionUsage.creation_time and
CompressionUsage.start_time are either ISO strings with 'Z' appended or epoch ms
numbers consistently), and ensure any code expecting those fields (the query
that maps to CompressionUsage) is updated to match the chosen format.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@components/api-server/src/client.rs`:
- Around line 51-79: The struct CompressionUsage documents creation_time and
start_time as ISO 8601 UTC but the SQL DATE_FORMAT used to populate those fields
omits the trailing 'Z'; update the query that builds creation_time and
start_time so the string explicitly indicates UTC (e.g., append 'Z' to the
DATE_FORMAT output) or switch to returning epoch milliseconds instead (so
CompressionUsage.creation_time and CompressionUsage.start_time are either ISO
strings with 'Z' appended or epoch ms numbers consistently), and ensure any code
expecting those fields (the query that maps to CompressionUsage) is updated to
match the chosen format.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 32b4e73a-bd16-4ce4-819d-d2d2adccadbf

📥 Commits

Reviewing files that changed from the base of the PR and between 925dc4c and bf1efb3.

📒 Files selected for processing (2)
  • components/api-server/src/client.rs
  • components/api-server/src/routes.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/client.rs`:
- Around line 649-697: The public method get_compression_usage currently allows
an empty job_statuses slice which produces an invalid SQL "IN ()" clause; add an
early input validation at the start of get_compression_usage that returns
Err(ClientError::InvalidInput(...)) when job_statuses.is_empty() to reject the
caller input (so callers get a structured error instead of a DB error), before
constructing placeholders and binding values; reference the
get_compression_usage function and ensure the error variant used matches other
client validations (e.g. ClientError::InvalidInput with a clear message like
"job_statuses cannot be empty").

In `@components/api-server/src/routes.rs`:
- Around line 334-559: Add regression tests exercising
CompressionUsageParams::validate for invalid limits by extending the existing
test module that uses test_app() and the "/usage/compression" route: add two new
#[tokio::test] functions (e.g., reject_zero_limit and reject_negative_limit)
that call app.oneshot with query strings containing
begin_timestamp=0&end_timestamp=100&limit=0 and limit=-1 respectively, assert
the response status is StatusCode::BAD_REQUEST, and assert the response body
contains the validation error text referencing limit (e.g., "limit must be > 0"
or similar) so the new guard in CompressionUsageParams::validate cannot regress
silently.

In `@components/clp-rust-utils/Cargo.toml`:
- Line 19: Update and commit the Rust lockfile to reflect the manifest changes:
regenerate Cargo.lock locally (e.g., run cargo fetch or cargo build) so the
change to the strum dependency line 'strum = { version = "0.28.0", features =
["derive"] }' in components/clp-rust-utils/Cargo.toml and the added
dev-dependencies in components/api-server/Cargo.toml are captured, then add and
commit the updated Cargo.lock to this PR so the CI task deps:lock:check-rust
(which uses cargo fetch --locked) will pass.

In `@tools/yscope-dev-utils`:
- Line 1: The submodule tools/yscope-dev-utils points to a non-existent commit
(6ec46856cfda8cbfb65d9386affbe3bc429186cb) and
tools/scripts/deps-download/init.sh still references the old SHA
(38bf51effc500a528c37052345c141a65ea88447); either push the missing commit to
the remote or update the submodule reference to a valid commit SHA in the
repository (fix the entry for tools/yscope-dev-utils in the gitlink), and ensure
tools/scripts/deps-download/init.sh is updated to match that same valid SHA so
both the submodule pointer and init.sh reference the same existing commit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 185885cf-8999-4d5e-9129-4b3ba9999004

📥 Commits

Reviewing files that changed from the base of the PR and between bf1efb3 and 55b298f.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • components/api-server/Cargo.toml
  • components/api-server/src/client.rs
  • components/api-server/src/error.rs
  • components/api-server/src/routes.rs
  • components/clp-rust-utils/Cargo.toml
  • components/clp-rust-utils/src/job_config/compression.rs
  • tools/yscope-dev-utils

jackluo923 and others added 3 commits April 12, 2026 03:26
- Return epoch-millisecond timestamps instead of ISO 8601 strings
- Add `limit` query parameter (default 1000) to cap result rows
- Remove `bigdecimal` dependency; use `f64` with `ROUND()` in SQL
- Use `sqlx::FromRow` derive on `CompressionUsage` (removes manual
  `try_get` mapping) and `query_as` replacing manual row iteration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s parsing

Replace custom `JOB_STATUS_LOOKUP` table and `resolve_job_status()`
with strum's `EnumString` derive on `CompressionJobStatus`. This gives
us case-insensitive `FromStr` parsing (e.g. "succeeded", "SUCCEEDED"
both work) and eliminates a manually-maintained string-to-int mapping
that could drift out of sync with the enum. `Pending` is excluded from
parsing via `#[strum(disabled)]` since it has no `start_time`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lient

Move `CompressionUsageParams`, `default_limit`, and the validation
logic from `routes.rs` into `client.rs` so that `Client` owns its
input types and validation. Any future caller of `Client` (CLI,
library, etc.) gets the same validated interface without depending on
the HTTP layer.

- Add `ClientError::InvalidInput` for validation errors
- `CompressionUsageParams::validate()` replaces standalone function
- `HandlerError` maps `InvalidInput` to `BadRequest` in routes.rs
- Uses `validate()` method rather than the `TryFrom<Unchecked>`
  pattern to avoid leaking an internal name into the `OpenAPI` schema

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jackluo923 and others added 8 commits April 12, 2026 03:59
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the intermediate `job_status_clause` variable and inline
`AND j.status IN ({placeholders})` directly into the SQL. The dynamic
placeholder generation is still needed for the variable-length IN
clause.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add reject_zero_limit and reject_negative_limit tests to ensure
CompressionUsageParams::validate correctly rejects non-positive limit
values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When job_statuses is empty, omit the IN clause entirely instead of
producing invalid SQL (`AND j.status IN ()`). An empty slice now means
"no filter" — return all statuses. The HTTP layer's validate() still
rejects explicitly empty job_status inputs (e.g. `job_status=,,,`)
as malformed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use #[rustfmt::skip] on the SQL format!() call so the project's
`format_strings = true` setting doesn't collapse the query into a
single-line blob. Each SQL clause is now on its own line for
readability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Group only by j.id (primary key) and wrap other non-aggregated columns
in ANY_VALUE(). Since all selected columns are functionally dependent
on j.id, this is correct under ONLY_FULL_GROUP_BY and avoids a lengthy
GROUP BY list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tate

Make `get_compression_usage` accept a `&ValidatedCompressionUsageParams`
instead of individual parameters. The validated struct is only
constructable through `CompressionUsageParams::validate()`, so callers
cannot bypass validation at the type level.

Also clarify the `limit` parameter description in the OpenAPI schema to
state "Must be > 0; defaults to 1000."

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`CREATE TABLE IF NOT EXISTS` skips entirely when the table already
exists, so the `JOB_START_TIME_STATUS` index added in this PR would
never be created on existing deployments. Add an `ALTER TABLE ADD INDEX`
that tolerates "Duplicate key name" errors, making it safe to run on
both fresh and existing databases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jackluo923 jackluo923 requested a review from hoophalab April 11, 2026 22:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/client.rs`:
- Around line 91-108: Validation currently enforces that
ValidatedCompressionUsageParams.job_statuses is non-empty, so the empty-branch
in get_compression_usage is dead code; remove the conditional that special-cases
an empty job_statuses and simplify the SQL construction to always use the IN
clause with the job_statuses vector (refer to the job_statuses local variable
and get_compression_usage function), ensuring any helper that builds the
parameterized IN list uses job_statuses directly and no longer needs an "omit IN
clause if empty" path.

In `@components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py`:
- Around line 98-100: Replace the brittle string match in the exception handler
that checks "Duplicate key name" with a numeric MySQL error-code check: import
mysql.connector.errorcode (or errorcode) and change the condition to verify
hasattr(err, "errno") and err.errno == errorcode.ER_DUP_KEYNAME before
swallowing the exception; otherwise re-raise. Locate the except block in
initialize-orchestration-db.py (the except Exception as err handler) and apply
the same pattern used elsewhere (e.g., sql_adapter.py) to ensure robust error
detection across MySQL/MariaDB variants.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fe109b88-109b-40fd-97d1-c8a3b227a79e

📥 Commits

Reviewing files that changed from the base of the PR and between bf1efb3 and 1f1f66b.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • docs/src/_static/generated/api-server-openapi.json is excluded by !**/generated/**
📒 Files selected for processing (7)
  • components/api-server/Cargo.toml
  • components/api-server/src/client.rs
  • components/api-server/src/error.rs
  • components/api-server/src/routes.rs
  • components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py
  • components/clp-rust-utils/Cargo.toml
  • components/clp-rust-utils/src/job_config/compression.rs

Copy link
Copy Markdown
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Validated that the endpoint works correctly after removing ANY_VALUE in group by.

Comment on lines +67 to +82
impl CompressionUsageParams {
/// Validates the parameters and resolves the requested job statuses into
/// [`CompressionJobStatus`] variants.
///
/// We use a `validate()` method rather than the conventional
/// `TryFrom<UncheckedX>` pattern because this struct also serves as the
/// `OpenAPI` query-parameter schema. Introducing an `Unchecked` wrapper would
/// leak an internal implementation detail into the generated API docs.
///
/// # Errors
///
/// Returns [`ClientError::InvalidInput`] if:
/// - `begin_timestamp > end_timestamp`
/// - `limit <= 0`
/// - `job_status` contains unrecognized or empty values
pub fn validate(&self) -> Result<ValidatedCompressionUsageParams, ClientError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
impl CompressionUsageParams {
/// Validates the parameters and resolves the requested job statuses into
/// [`CompressionJobStatus`] variants.
///
/// We use a `validate()` method rather than the conventional
/// `TryFrom<UncheckedX>` pattern because this struct also serves as the
/// `OpenAPI` query-parameter schema. Introducing an `Unchecked` wrapper would
/// leak an internal implementation detail into the generated API docs.
///
/// # Errors
///
/// Returns [`ClientError::InvalidInput`] if:
/// - `begin_timestamp > end_timestamp`
/// - `limit <= 0`
/// - `job_status` contains unrecognized or empty values
pub fn validate(&self) -> Result<ValidatedCompressionUsageParams, ClientError> {
impl TryFrom<CompressionUsageParams> for ValidatedCompressionUsageParams {
type Error = ClientError;
/// Validates the parameters and resolves the requested job statuses into
/// [`CompressionJobStatus`] variants.
///
/// We use a `validate()` method rather than the conventional
/// `TryFrom<UncheckedX>` pattern because this struct also serves as the
/// `OpenAPI` query-parameter schema. Introducing an `Unchecked` wrapper would
/// leak an internal implementation detail into the generated API docs.
///
/// # Errors
///
/// Returns [`ClientError::InvalidInput`] if:
/// - `begin_timestamp > end_timestamp`
/// - `limit <= 0`
/// - `job_status` contains unrecognized or empty values
pub fn try_from(value: CompressionUsageParams) -> Result<ValidatedCompressionUsageParams, Self::Error> {

AI might misunderstand my comments. If we already create a ValidatedCompressionUsageParams, then instead of impl a validate method, it might be best to impl TryFrom<ValidatedCompressionUsageParams> for CompressionUsageParams { ... }.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it manually. The doc is stale and callsite need tob e updated.

// task creation) since they consumed no compute resources.
"SELECT \
j.id, \
ANY_VALUE(j.status) AS status, \
Copy link
Copy Markdown
Contributor

@hoophalab hoophalab Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this error in log. Not sure whether I misconfigured anything.

{"timestamp":"2026-04-12T13:37:11.319164Z","level":"ERROR","fields":{"message":"Failed to fetch compression usage: Sql(Database(MySqlDatabaseError { code: Some(\"42000\"), number: 1305, message: \"FUNCTION clp-db.ANY_VALUE does not exist\" }))"},"filename":"components/api-server/src/routes.rs","line_number":284}

We could just remove ANY_VALUE if there is no serious compatibility concerns with other databases.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to MAX() which is equivalent to ANY_VALUE() on dependent string and works across MYSQL and MariaDB

#[repr(i32)]
#[strum(ascii_case_insensitive)]
pub enum CompressionJobStatus {
#[strum(disabled)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why disable this? Querying pending jobs is still valid.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, when it is disabled, it not only disabled the default (where it should not be used). If user really want to fetch the resource usage for pending tasks, they should be able to.

}

#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests have become meaningless because we've switched to strum

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, removed.

Comment on lines +704 to +706
ANY_VALUE(CAST(UNIX_TIMESTAMP(j.creation_time) * 1000 AS SIGNED)) AS creation_time, \
ANY_VALUE(CAST(UNIX_TIMESTAMP(j.start_time) * 1000 AS SIGNED)) AS start_time, \
ANY_VALUE(ROUND(j.duration, 3)) AS duration, \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we remove ROUND and just return the full precision float value?
  2. We should be able to bind a chrono::Datetime<?> directly and avoid FROM_UNIXTIME(? / 1000.0). We can modify types in struct ValidatedCompressionUsageParams and struct CompressionUsageParams using #[serde(with = "chrono::serde::ts_milliseconds")].
  3. Since we are already using query_as, can we change the time fields in struct CompressionUsage to type chrono, and setup #[serde(with = "chrono::serde::ts_milliseconds")]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Remove tests that merely verify strum's EnumString derive behavior
(valid parsing, case insensitivity, invalid rejection). Keep the test
asserting Pending is parseable but excluded from DEFAULT_JOB_STATUSES.
… millis API.

- Convert `ValidatedCompressionUsageParams` timestamps to `DateTime<Utc>`,
  validated via `timestamp_millis_opt()` for range checking
- Use `DateTime<Utc>` in `CompressionUsage` with `chrono::serde::ts_milliseconds`
  so JSON serialization remains i64 epoch millis
- Simplify SQL: remove `FROM_UNIXTIME(? / 1000.0)`, `CAST(UNIX_TIMESTAMP(...))`,
  and `ROUND()` — sqlx binds `DateTime<Utc>` directly and selects raw columns
- Replace MySQL-only `ANY_VALUE()` with cross-compatible `MAX()`
- Add `#[schema(value_type = i64)]` so OpenAPI docs show i64, not datetime
- Add chrono dependency and sqlx chrono feature
Validation already guarantees `job_statuses` is non-empty, so the
`if params.job_statuses.is_empty()` branch in `get_compression_usage`
was unreachable. Remove it for clarity.
- Remove comment about empty job_statuses branch (removed in prior commit)
- Add PENDING = 0 to CompressionUsage.status docstring
@jackluo923
Copy link
Copy Markdown
Member Author

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/routes.rs`:
- Around line 271-298: The parsing currently allows duplicate statuses which
leads to redundant SQL placeholders; update validate_compression_usage_params to
deduplicate the resolved statuses before returning (e.g., after mapping with
resolve_job_status in validate_compression_usage_params, filter out repeats
while preserving order by tracking seen statuses in a local HashSet or Vec),
ensuring DEFAULT_JOB_STATUSES remains the fallback and that the existing
empty-check and error paths (and references to resolve_job_status and
DEFAULT_JOB_STATUSES) are preserved.

In `@components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py`:
- Around line 81-82: The composite index `JOB_START_TIME_STATUS` on (start_time,
status) is correct for get_compression_usage's query pattern but CREATE TABLE IF
NOT EXISTS only adds it for new DBs; add a migration step in
initialize-orchestration-db.py to ensure existing deployments get the index:
detect if the table exists and, if so, run an ALTER TABLE to add INDEX
`JOB_START_TIME_STATUS` (`start_time`, `status`) USING BTREE (guarded to avoid
duplicate-creation errors), or add an explicit migration script that executes
that ALTER TABLE; reference the table creation block and the index name
`JOB_START_TIME_STATUS` when implementing this change.

🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f41ac76a-326c-4e97-af35-71e7afa2beef

📥 Commits
Reviewing files that changed from the base of the PR and between 9ae078e and d5a0f8f.

⛔ Files ignored due to path filters (1)

  • Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

raise error when duplicates are detected. migration is already migrated..

…sage params.

Returns a descriptive error (e.g. "Duplicate job_status: Succeeded")
instead of silently deduplicating or accepting duplicate statuses.
hoophalab
hoophalab previously approved these changes Apr 13, 2026
Copy link
Copy Markdown
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

  1. Need to rerun task codegen:openapi and task lint:fix-rust
  2. How about this title -- adding ` and a period
feat(api-server): Add GET `/usage/compression` endpoint.

@jackluo923 jackluo923 changed the title feat(api-server): Add GET /usage/compression endpoint feat(api-server): Add GET /usage/compression endpoint. Apr 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/routes.rs`:
- Around line 336-601: Add a test that exercises the duplicate-job-status
validation path: create a new async test function named
reject_duplicate_job_status in the existing tests module that uses test_app()
and sends a GET to
"/usage/compression?begin_timestamp=0&end_timestamp=100&job_status=SUCCEEDED,SUCCEEDED",
then assert the response status is StatusCode::BAD_REQUEST and that the response
body contains "Duplicate job_status". This ensures the validation logic in
client.rs (the duplicate-status check around lines where Duplicate job_status is
produced) and the conversion via ValidatedCompressionUsageParams/try_from are
covered to prevent regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 11bc3b4f-a97e-41db-a41f-70d60de723ba

📥 Commits

Reviewing files that changed from the base of the PR and between 82a68e3 and 015885b.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • docs/src/_static/generated/api-server-openapi.json is excluded by !**/generated/**
📒 Files selected for processing (1)
  • components/api-server/src/routes.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/routes.rs`:
- Around line 372-624: Add a new Tokio test (e.g.
reject_out_of_range_epoch_millis) in the same test module that sends a request
to /usage/compression with a begin_timestamp (or end_timestamp) value outside
Chrono's representable epoch-millis range to exercise
ValidatedCompressionUsageParams::try_from's out-of-range branch; assert the
response status is StatusCode::BAD_REQUEST and that the body contains the
validation error text so the HTTP mapping for out-of-range timestamps is
covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b966e771-b415-49af-98c7-1cc780adedf7

📥 Commits

Reviewing files that changed from the base of the PR and between 015885b and 22196a6.

📒 Files selected for processing (1)
  • components/api-server/src/routes.rs

Comment on lines +372 to +624
#[tokio::test]
async fn reject_begin_greater_than_end() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri("/usage/compression?begin_timestamp=200&end_timestamp=100")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
let body = get_body(response).await;
assert!(body.contains("begin_timestamp must be <= end_timestamp"));
}

#[tokio::test]
async fn reject_unknown_job_status() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri(
"/usage/compression?begin_timestamp=0&end_timestamp=100&job_status=UNKNOWN",
)
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
let body = get_body(response).await;
assert!(body.contains("Unknown job_status: UNKNOWN"));
}

#[tokio::test]
async fn accept_lowercase_job_status() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri(
"/usage/compression?begin_timestamp=0&end_timestamp=100&\
job_status=succeeded",
)
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let body = get_body(response).await;
assert_eq!(body, "[2]"); // succeeded → 2
}

#[tokio::test]
async fn accept_valid_params_with_defaults() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri("/usage/compression?begin_timestamp=0&end_timestamp=100")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let body = get_body(response).await;
assert_eq!(body, "[2,3,4]"); // SUCCEEDED, FAILED, KILLED
}

#[tokio::test]
async fn accept_comma_separated_job_status() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri(
"/usage/compression?begin_timestamp=0&end_timestamp=100&\
job_status=SUCCEEDED,RUNNING",
)
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let body = get_body(response).await;
assert_eq!(body, "[2,1]"); // SUCCEEDED, RUNNING
}

#[tokio::test]
async fn accept_spaces_around_commas() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri(
"/usage/compression?begin_timestamp=0&end_timestamp=100&\
job_status=SUCCEEDED%2C+FAILED",
)
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let body = get_body(response).await;
assert_eq!(body, "[2,3]"); // SUCCEEDED, FAILED
}

#[tokio::test]
async fn accept_trailing_comma() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri(
"/usage/compression?begin_timestamp=0&end_timestamp=100&\
job_status=SUCCEEDED,",
)
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let body = get_body(response).await;
assert_eq!(body, "[2]"); // trailing comma ignored
}

#[tokio::test]
async fn accept_single_job_status() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri("/usage/compression?begin_timestamp=0&end_timestamp=100&job_status=KILLED")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::OK);
let body = get_body(response).await;
assert_eq!(body, "[4]");
}

#[tokio::test]
async fn reject_missing_begin_timestamp() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri("/usage/compression?end_timestamp=100")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
let body = get_body(response).await;
assert!(
body.contains("begin_timestamp") || body.contains("deserialize"),
"expected error about missing begin_timestamp, got: {body}"
);
}

#[tokio::test]
async fn reject_empty_job_status() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri("/usage/compression?begin_timestamp=0&end_timestamp=100&job_status=")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
let body = get_body(response).await;
assert!(
body.contains("at least one valid status"),
"expected error about empty job_status, got: {body}"
);
}

#[tokio::test]
async fn reject_zero_limit() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri("/usage/compression?begin_timestamp=0&end_timestamp=100&limit=0")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
let body = get_body(response).await;
assert!(
body.contains("limit must be > 0"),
"expected error about limit, got: {body}"
);
}

#[tokio::test]
async fn reject_negative_limit() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri("/usage/compression?begin_timestamp=0&end_timestamp=100&limit=-1")
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
let body = get_body(response).await;
assert!(
body.contains("limit must be > 0"),
"expected error about limit, got: {body}"
);
}

#[tokio::test]
async fn reject_duplicate_job_status() {
let app = test_app();
let response = app
.oneshot(
Request::builder()
.uri(
"/usage/compression?begin_timestamp=0&end_timestamp=100&\
job_status=SUCCEEDED,SUCCEEDED",
)
.body(Body::empty())
.unwrap(),
)
.await
.unwrap();
assert_eq!(response.status(), StatusCode::BAD_REQUEST);
let body = get_body(response).await;
assert!(
body.contains("Duplicate job_status"),
"expected error about duplicate job_status, got: {body}"
);
}
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add a regression test for out-of-range epoch millis.

ValidatedCompressionUsageParams::try_from also rejects timestamps outside Chrono’s representable range, but this test module never exercises that 400 path. Covering it here would keep the validation-to-HTTP mapping from regressing silently.

Proposed test
+    #[tokio::test]
+    async fn reject_out_of_range_timestamp() {
+        let app = test_app();
+        let response = app
+            .oneshot(
+                Request::builder()
+                    .uri(
+                        "/usage/compression?begin_timestamp=9223372036854775807&\
+                         end_timestamp=9223372036854775807",
+                    )
+                    .body(Body::empty())
+                    .unwrap(),
+            )
+            .await
+            .unwrap();
+        assert_eq!(response.status(), StatusCode::BAD_REQUEST);
+        let body = get_body(response).await;
+        assert!(
+            body.contains("timestamp out of representable range"),
+            "expected error about timestamp range, got: {body}"
+        );
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/api-server/src/routes.rs` around lines 372 - 624, Add a new Tokio
test (e.g. reject_out_of_range_epoch_millis) in the same test module that sends
a request to /usage/compression with a begin_timestamp (or end_timestamp) value
outside Chrono's representable epoch-millis range to exercise
ValidatedCompressionUsageParams::try_from's out-of-range branch; assert the
response status is StatusCode::BAD_REQUEST and that the body contains the
validation error text so the HTTP mapping for out-of-range timestamps is
covered.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to cover this since it is impossible in production, we do not live that long before it overflows

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackluo923 That's a fair point — i64::MAX milliseconds from the Unix epoch is around year 292,278,994, so the overflow branch is purely theoretical. Happy to drop the suggestion.


🧠 Learnings used
Learnt from: Nathan903
Repo: y-scope/clp PR: 2079
File: components/api-server/src/telemetry.rs:8-8
Timestamp: 2026-03-06T17:46:23.008Z
Learning: Assume the repository tracks the latest stable Rust toolchain (no pinned MSRV). Do not rely on nightly-only features in code; prefer stable APIs such as Duration::from_hours (stable since Rust 1.91). When reviewing code in this area, ensure it aligns with the repo’s clippy configuration: clippy runs on nightly with -D clippy::pedantic and -D clippy::duration_suboptimal_units, so prefer idiomatic duration usage that satisfies these lints and avoids discouraged duration literals.

hoophalab
hoophalab previously approved these changes Apr 13, 2026
/// Sum of all task durations (CPU-seconds across all parallel workers).
/// `None` if all task duration values are NULL in the database.
pub tasks_duration: Option<f64>,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot that the docstrings in this struct is used in the OpenAPI document. We need to make it user-oriented:

  1. We probably shouldn't reference a .py file in status's docstring. Instead, how about changing its type from i32 to CompressionJobStatus, derive ToSchema for CompressionJobStatus , and add docstrings in CompressionJobStatus. OpenAPI should be able to reference enum CompressionJobStatus directly.
    • there is another dev-oriented docstring "Mirror of ..." on top of CompressionJobStatus. We should be able to hide it from OpenAPI by replacing /// with //.
  2. It might be worth renaming status as job_status to keep consistent with struct CompressionUsageParams. sqlx::rename should allow us to use different struct field names and column names.
  3. It might be worthwhile to rename start_time as time_begin_millisecs to keep consistent with QueryConfig. Also creation_time -> time_creation_millisecs.
  4. Recent log ingestor changes also include units in the API field names. So, it's best to rename
    • (un)?compressed_size as (un)?compressed_size_bytes or (un)?compressed_bytes.
    • .*duration as .*duration_secs
  5. Can we not exposing database details such as WHERE clause and NULL value in database since this struct is user oriented.
  6. Can we change /// A single row returned by the compression usage query (one row per job). to something like /// Resource usage statistics for the compression job with the specified id.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +25 to +27
///
/// Note: `Pending` is intentionally excluded because pending jobs have no
/// `start_time` and are therefore excluded by the time-range WHERE clause.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This NOTE duplicates with Covers all terminal states that consumed compute resources since PENDING is a non-terminal state?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +40 to +41
/// End of usage window (epoch milliseconds, inclusive).
pub end_timestamp: i64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thinking about this again: is it better to make end_timestamp exclusive? end in vec[start: end] is exclusive
  2. To keep consistent with the query API, can we rename begin_timestamp as time_range_begin_millisecs and end_timestamp as time_range_end_millisecs

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pub job_status: Option<String>,
/// Maximum number of jobs to return. Must be > 0; defaults to 1000.
#[serde(default = "default_limit")]
pub limit: i64,
Copy link
Copy Markdown
Contributor

@hoophalab hoophalab Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub limit: i64,
pub limit: u64,

Can we change i64 to u32 since it > 0? Also, using u64 in rust doesn't actually give us performance benefits on 64-bit systems.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep it as i64 since it produce better error message.
"invalid type: expected u32, found -1" error
vs
"limit must be > 0"

…ted.

- Change `status: i32` to `job_status: CompressionJobStatus` with typed enum
- Rename fields to include units: `_millisecs`, `_secs`, `_bytes`
- Rewrite docstrings to remove DB internals/Python file references
- Add `ToSchema` derive and user-facing docstrings to `CompressionJobStatus`
- Register `CompressionJobStatus` in OpenAPI schema components
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/api-server/src/client.rs`:
- Around line 57-66: ValidatedCompressionUsageParams currently exposes its
invariants by making fields pub (begin_timestamp, end_timestamp, job_statuses,
limit), allowing callers to construct invalid instances; change those fields to
private (remove pub) and add minimal read-only accessor methods (e.g.,
begin_timestamp(), end_timestamp(), job_statuses(), limit()) used by routes.rs,
keep the TryFrom<CompressionUsageParams> impl as the only constructor that
enforces validation, and update any call sites (such as
Client::get_compression_usage and routes.rs) to use the new accessors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2784a491-b5e4-4eb8-8bb2-dc7564fde8d3

📥 Commits

Reviewing files that changed from the base of the PR and between 9d5801a and 99ea8f5.

⛔ Files ignored due to path filters (1)
  • docs/src/_static/generated/api-server-openapi.json is excluded by !**/generated/**
📒 Files selected for processing (3)
  • components/api-server/src/client.rs
  • components/api-server/src/routes.rs
  • components/clp-rust-utils/src/job_config/compression.rs

Comment on lines +57 to +66
/// Validated parameters produced by [`TryFrom<CompressionUsageParams>`].
///
/// This type is intentionally non-constructable outside of the `TryFrom` impl
/// so that callers of [`Client::get_compression_usage`] cannot bypass validation.
pub struct ValidatedCompressionUsageParams {
pub begin_timestamp: DateTime<Utc>,
pub end_timestamp: DateTime<Utc>,
pub job_statuses: Vec<CompressionJobStatus>,
pub limit: i64,
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n components/api-server/src/client.rs | sed -n '50,75p'

Repository: y-scope/clp

Length of output: 1000


🏁 Script executed:

# Search for ValidatedCompressionUsageParams struct definition and usage
rg "ValidatedCompressionUsageParams" -A 10 -B 5 components/api-server/src/

Repository: y-scope/clp

Length of output: 6974


🏁 Script executed:

# Check if TryFrom impl exists for CompressionUsageParams
rg "impl.*TryFrom.*CompressionUsageParams" -A 20 components/api-server/src/

Repository: y-scope/clp

Length of output: 1737


🏁 Script executed:

# Check how routes.rs accesses fields from ValidatedCompressionUsageParams
rg "ValidatedCompressionUsageParams" components/api-server/src/routes.rs -B 2 -A 2

Repository: y-scope/clp

Length of output: 770


🏁 Script executed:

# Search for direct construction of ValidatedCompressionUsageParams (struct literals with {})
rg "ValidatedCompressionUsageParams\s*\{" components/api-server/src/

Repository: y-scope/clp

Length of output: 246


🏁 Script executed:

# Also search for any .try_from patterns to ensure all conversions go through the impl
rg "try_from.*CompressionUsageParams" components/api-server/src/ -A 2

Repository: y-scope/clp

Length of output: 347


🏁 Script executed:

# Check if there are any tests or external code that might construct ValidatedCompressionUsageParams
rg "ValidatedCompressionUsageParams" components/api-server/src/ -B 3 -A 3 | grep -E "(test|::new|::default|\{)"

Repository: y-scope/clp

Length of output: 723


🏁 Script executed:

# Verify the TryFrom validation actually checks job_statuses is not empty
rg "impl TryFrom" components/api-server/src/client.rs -A 60 | head -80

Repository: y-scope/clp

Length of output: 2658


ValidatedCompressionUsageParams public fields undermine validation enforcement.

The doc comment declares this type "non-constructable outside of the TryFrom impl" to prevent bypass of validation, but lines 62–65 expose all fields as pub. This allows callers to construct invalid instances directly—for example, with backwards timestamps, non-positive limit, or empty job_statuses—defeating the invariant.

Suggested direction
 pub struct ValidatedCompressionUsageParams {
-    pub begin_timestamp: DateTime<Utc>,
-    pub end_timestamp: DateTime<Utc>,
-    pub job_statuses: Vec<CompressionJobStatus>,
-    pub limit: i64,
+    begin_timestamp: DateTime<Utc>,
+    end_timestamp: DateTime<Utc>,
+    job_statuses: Vec<CompressionJobStatus>,
+    limit: i64,
 }

Then expose only accessor methods for the fields that routes.rs actually needs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/api-server/src/client.rs` around lines 57 - 66,
ValidatedCompressionUsageParams currently exposes its invariants by making
fields pub (begin_timestamp, end_timestamp, job_statuses, limit), allowing
callers to construct invalid instances; change those fields to private (remove
pub) and add minimal read-only accessor methods (e.g., begin_timestamp(),
end_timestamp(), job_statuses(), limit()) used by routes.rs, keep the
TryFrom<CompressionUsageParams> impl as the only constructor that enforces
validation, and update any call sites (such as Client::get_compression_usage and
routes.rs) to use the new accessors.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed docstring instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackluo923 Understood — updating the docstring to accurately reflect that the fields are public (rather than restricting access) is a valid choice. The mismatched claim between the documentation and reality was the key issue flagged. Thanks for the clarification!


🧠 Learnings used
Learnt from: sitaowang1998
Repo: y-scope/clp PR: 1417
File: components/clp-py-utils/clp_py_utils/serialization_utils.py:6-13
Timestamp: 2025-10-18T15:38:01.210Z
Learning: In the y-scope/clp repository, parameter descriptions in docstrings are intentionally omitted when the parameter name is self-explanatory. This is the project's preferred documentation style.

Learnt from: goynam
Repo: y-scope/clp PR: 2178
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:0-0
Timestamp: 2026-04-13T18:49:55.259Z
Learning: In the query scheduler (`components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py`), `job["job_config"]` is serialized as plain msgpack (no brotli compression), written by the Rust API server via `rmp_serde`. It is not subject to brotli decompression failures. Query configs are also inherently small (query string, timestamps, a few flags), making VARBINARY(60000) truncation extremely unlikely in practice. The MEDIUMBLOB schema change in PR `#2178` further prevents future truncation for that column.

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: Nathan903
Repo: y-scope/clp PR: 2079
File: components/api-server/src/telemetry.rs:8-8
Timestamp: 2026-03-06T17:46:23.008Z
Learning: Assume the repository tracks the latest stable Rust toolchain (no pinned MSRV). Do not rely on nightly-only features in code; prefer stable APIs such as Duration::from_hours (stable since Rust 1.91). When reviewing code in this area, ensure it aligns with the repo’s clippy configuration: clippy runs on nightly with -D clippy::pedantic and -D clippy::duration_suboptimal_units, so prefer idiomatic duration usage that satisfies these lints and avoids discouraged duration literals.

…ake end exclusive.

- Rename begin_timestamp -> time_range_begin_millisecs
- Rename end_timestamp -> time_range_end_millisecs
- Change end boundary from inclusive (<=) to exclusive (<)
@jackluo923 jackluo923 requested a review from hoophalab April 14, 2026 03:15
Copy link
Copy Markdown
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. One nitpick

Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants