Skip to content

(feat) Implement metrics rest api#4115

Open
obelix74 wants to merge 14 commits into
apache:mainfrom
obelix74:implement_metrics_rest_api
Open

(feat) Implement metrics rest api#4115
obelix74 wants to merge 14 commits into
apache:mainfrom
obelix74:implement_metrics_rest_api

Conversation

@obelix74
Copy link
Copy Markdown
Contributor

@obelix74 obelix74 commented Apr 2, 2026

This is an implementation of the proposal in #4010. This uses the stable envelope design for the REST API instead of a flattened structure.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Anand Kumar Sankaran added 6 commits April 10, 2026 07:26
Adds read-only REST endpoints at /api/metrics-reports/v1/ to expose
persisted Iceberg scan and commit metrics without requiring direct
database access.

Changes:
- spec/metrics-reports-service.yml: OpenAPI 3.0.3 spec for the new API
- api/metrics-reports-service/: JAX-RS code generation module
- polaris-core: TABLE_READ_METRICS privilege (id=103), LIST_TABLE_METRICS
  authorizable operation, default no-op read methods on MetricsPersistence,
  MetricsReportToken for keyset cursor pagination
- persistence/relational-jdbc: JDBC implementations of listScanReports/
  listCommitReports using DESC keyset pagination; MetricsModelUtils shared
  ObjectMapper/parseMetadata utility
- runtime/service: MetricsReportsService — resolves catalog/namespace/table
  via PolarisResolutionManifest, authorizes with TABLE_READ_METRICS, delegates
  to MetricsPersistence; always bounded by LIMIT (default 100)
- Tests: MetricsReportsServiceTest (including 403/404 scenarios),
  MetricsReportTokenTest, metadata round-trip tests for model classes
Restructures each report in the response from a flat object into a
stable envelope with nested actor/request/object/payload sub-objects.
This decouples the API shape from the DB schema and allows payload
schemas to evolve independently without breaking clients.

- MetricsReportsService: replace baseRecordFields + flat maps with
  per-type envelope builders (actor, request, object, payload.data)
- spec/metrics-reports-service.yml: replace flat ScanMetricsReport /
  CommitMetricsReport with envelope schemas; add MetricsActor,
  MetricsRequest, ScanMetricsObject, CommitMetricsObject,
  ScanPayload/CommitPayload, ScanPayloadData/CommitPayloadData
- Tests: add scanReportHasEnvelopeStructure and
  commitReportHasEnvelopeStructure to assert the nested shape
- docs: add envelope structure example to telemetry.md
…tsService

- Remove flat inline errorResponse() helper; all 400s now throw IllegalArgumentException
  so IcebergExceptionMapper produces a consistent Iceberg error envelope instead of
  a divergent flat {message, type, code} body
- Replace BadRequestException (JAX-RS WebApplicationException) with
  IllegalArgumentException in decodeNamespace so the same mapper handles it
- Guard Integer.parseInt in ModelScanMetricsReport.toRecord() against malformed
  projected_field_ids tokens in the DB — silently skips non-numeric tokens
- Remove unused @nullable import from MetricsReportsService
- Add test for PATH_COULD_NOT_BE_FULLY_RESOLVED resolver status (namespace/table
  path not found); update 5 test expectations to match exception-throwing contract
…esponse

These are DB-internal surrogate IDs with no meaning to API clients, who already
know the catalog, namespace, and table from the request URL. The snapshotId
(an Iceberg-level concept) is retained in the object envelope as it identifies
which snapshot the metrics apply to.
…w feedback

- Replace executeSelectOverStream + AtomicReference with executeSelect,
  which returns List<T> directly and is simpler for single-threaded use
- Remove the unused `columns` parameter from buildMetricsQuery; use
  SELECT * since the parameter was always ALL_COLUMNS per model
@obelix74 obelix74 force-pushed the implement_metrics_rest_api branch from 1d37c50 to ac66c27 Compare April 10, 2026 14:26
Copy link
Copy Markdown
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this feature forward, @obelix74 !

The PR LGTM in general. Posting some comments about code organization, subject to discussion, of course.

@PolarisImmutable
@JsonSerialize(as = ImmutableMetricsReportToken.class)
@JsonDeserialize(as = ImmutableMetricsReportToken.class)
public interface MetricsReportToken extends Token {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this class does not belong in polaris-core. The idea behind TokenType and related java service descriptors was to allow token types to be pluggable.

It looks like this class is currently used only by relational-jdbc, so let's move it there.

If it must be reused at some point, it will be easier to expose it later than take away from polaris-core.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Moved MetricsReportToken.java to persistence/relational-jdbc/.../jdbc/pagination/ (the only consumer)
  • Updated JdbcBasePersistenceImpl.java import
  • Moved MetricsReportTokenTest.java to relational-jdbc
  • Removed MetricsReportToken$MetricsReportTokenType from polaris-core's service descriptor; created a new one in relational-jdbc

Comment thread spec/metrics-reports-service.yml Outdated
schema:
type: integer
minimum: 1
maximum: 1000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a max page size to the API spec looks like a overkill... ultimately, it's an impl. concern, not an API concern.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Removed maximum: 1000 from the pageSize query parameter schema

Comment thread spec/metrics-reports-service.yml Outdated
Comment thread spec/metrics-reports-service.yml Outdated
$ref: '#/components/schemas/ErrorResponse'

components:
securitySchemes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm not sure it's worth specifying authentication schemes in each OpenAPI file... I believe Polaris will handle AuthN uniformly across all API and evolving AuthN will not (and should not) affect each API definition individually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Removed root-level security: block and the entire securitySchemes: section from components:

Comment thread spec/metrics-reports-service.yml Outdated
Comment thread runtime/service/build.gradle.kts Outdated
- spec: remove max page size (impl concern, not API concern)
- spec: clarify "epoch milliseconds" → "Unix epoch milliseconds"
- spec: remove securitySchemes (auth handled uniformly by Polaris)
- spec: update namespace description to reference Polaris Iceberg REST API convention
- Move MetricsReportToken from polaris-core to relational-jdbc, the sole consumer; register TokenType service descriptor there
- Move MetricsReportsService into new extensions/metrics-reports/impl module following the OPA Authorizer pattern; wire as runtimeOnly in runtime/server so it is elective
Copy link
Copy Markdown
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor remaining comment 😅

@sungwy , @sneethiraj : FYI about the new AuthZ operation.

Comment thread gradle/projects.main.properties
Comment thread CHANGELOG.md Outdated
Comment thread spec/metrics-reports-service.yml Outdated
@obelix74 obelix74 requested a review from dimas-b April 16, 2026 16:08
Copy link
Copy Markdown
Contributor

@flyingImer flyingImer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction looks right to me

Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC

  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

…porter

Move the @beta signal to the actual SPI interfaces rather than only
on MetricsPersistence, which is a BasePersistence concern not visible
to callers of the service layer.
@obelix74
Copy link
Copy Markdown
Contributor Author

The direction looks right to me

Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC
  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

Thank you. I have added @Beta annotation to PolarisMetricsManager and PolarisMetricsReporter.

About the second point, thank you. Would this mean a read method to PolarisMetricsManager and MetaStoreManager mirroring the write path? Should I do it in this PR or can this wait?

@flyingImer
Copy link
Copy Markdown
Contributor

The direction looks right to me
Two structural observations:

  • With this PR, MetricsPersistence grows from 2 write methods to 4 (read + write). It's marked @beta and the javadoc calls it a "Service Provider Interface." But it lives on BasePersistence, which only local DB backends implement. NoSqlMetaStoreManager and RemotePolarisMetaStoreManager go through empty BasePersistence implementations, so these methods are permanently no-op for them. Meanwhile, the actual SPI interfaces (PolarisMetricsReporter, PolarisMetricsManager) have no annotation at all. The @beta signal is on the wrong layer IIUC
  • The write path enters through PolarisMetricsManager on MetaStoreManager, but this read path bypasses that layer and goes straight to BasePersistence via callContext.getMetaStore(). If we want the metrics read API to work for non-JDBC backends, it would need a MetaStoreManager-level entry point, same as writes.

Not blocking on this. I think the question of where metrics persistence should sit architecturally is worth a discussion on dev@.

Thank you. I have added @Beta annotation to PolarisMetricsManager and PolarisMetricsReporter.

About the second point, thank you. Would this mean a read method to PolarisMetricsManager and MetaStoreManager mirroring the write path? Should I do it in this PR or can this wait?

Thanks for adding @beta.

Reads should go through MetaStoreManager too, same as writes. If reads stay on BasePersistence, non-JDBC backends can't implement the read API at all. I'd prefer fixing that in this PR so the read path ships with the same layering as writes.

Separately, the persistence schema discussion on dev@ is still open. A follow-up issue linking to that thread would help track it.

@obelix74
Copy link
Copy Markdown
Contributor Author

Reads should go through MetaStoreManager too, same as writes. If reads stay on BasePersistence, non-JDBC backends can't implement the read API at all. I'd prefer fixing that in this PR so the read path ships with the same layering as writes.

Separately, the persistence schema discussion on dev@ is still open. A follow-up issue linking to that thread would help track it.

Pushed a commit (and rebased against updated main). listScanMetrics and listCommitMetrics are now on PolarisMetricsManager (and therefore MetaStoreManager), following the same pattern as the write methods. MetricsReportsService now injects PolarisMetaStoreManager and routes reads through it rather than calling callContext.getMetaStore() directly.

For the persistence schema discussion — I'll open a follow-up issue linking to the dev@ thread once there's a message to reference. Happy to do that now if you can share the thread link (I don't have it handy). Please let me know.

@obelix74 obelix74 requested a review from flyingImer April 28, 2026 18:00
Copy link
Copy Markdown
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics API story LGTM 👍

I still have more general concerns related to SPI design and service wiring, but they are not specific to this feature.

Thanks for working on this @obelix74 !

@github-project-automation github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board May 15, 2026
Copy link
Copy Markdown
Contributor

@flyingImer flyingImer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to merge as-is now that Dmitri approved, or is there another round? Asking because the May 7 metrics sync landed on a few directional items that touch the schema and SPI shape here. Left some questions inline.

@obelix74 obelix74 requested a review from dimas-b May 19, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants