Skip to content

👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size#4633

Open
thomas-lebeau wants to merge 4 commits into
mainfrom
worktree-port-perf-tests-to-benchmark
Open

👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size#4633
thomas-lebeau wants to merge 4 commits into
mainfrom
worktree-port-perf-tests-to-benchmark

Conversation

@thomas-lebeau
Copy link
Copy Markdown
Collaborator

@thomas-lebeau thomas-lebeau commented May 15, 2026

Motivation

Two unrelated cleanups:

  1. The test-performance GitLab job now only computes bundle sizes (the CPU and memory measurements it used to compute were tightly coupled to the per-PR comment flow). Rename the job to match.
  2. The per-API CPU/memory measurements still have value as a regression signal on main, but they were originally compared against a base commit for every PR. Keep the measurement, drop the comparison, and run it on the existing performance-benchmark-scheduled schedule instead of on every PR.

Changes

  • Rename CI job test-performancebundle-size.
  • Add new CI job api-performance-benchmark, gated by TARGET_TASK_NAME == "performance-benchmark-scheduled" so it shares the cadence of the existing benchmark schedule.
  • Add scripts/api-performance/:
    • lib/cpuPerformance.ts — triggers the Datadog synthetic test that drives the playground (/performance/cpu); the synthetic test reports its per-API CPU metrics to Datadog directly.
    • lib/memoryPerformance.ts — launches Chrome via puppeteer, clicks each playground button 30× per API while sampling the heap profile, and reports the median SDK-attributable bytes per API to Datadog.
    • lib/constants.ts — the 7-entry TESTS list (RUM addGlobalContext / addAction / addError / addTiming / startView / startStop session-replay-recording, Logs logMessage).
    • lib/reportToDatadog.ts — local copy of the helper (lint forbids importing across scripts/*/lib/ directories).
  • Drop the legacy scripts/performance/lib/{cpuPerformance,memoryPerformance,constants}.ts files and the CPU/Memory sections from the PR comment formatter (+ its spec).

Notes

  • Earlier iterations of this PR added per-API scenarios under test/performance/scenarios/, but they produced order-of-magnitude-different numbers from the old job (scenario-total vs per-call methodology). Reverted in favor of preserving the original measurement approach.
  • The new job runs on the same schedule as performance-benchmark — change the cadence in GitLab's CI/CD → Schedules → performance-benchmark-scheduled.
  • Metrics flow to Datadog: CPU from the synthetic test (its existing dashboards), memory as logs tagged per TESTS.property from scripts/api-performance/lib/memoryPerformance.ts.

Test instructions

  • Check the next GitLab pipeline: the bundle-size job runs (renamed from test-performance) and posts the bundle-size PR comment as before.
  • On the next performance-benchmark-scheduled run, the new api-performance-benchmark job runs alongside performance-benchmark and pushes memory metrics to Datadog (look for SDK memory consumption logs from the browser-sdk service / ci env on main).

Checklist

  • Tested locally (typecheck, lint, script test pass)
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.
  • Updated documentation and/or relevant AGENTS.md file

@datadog-prod-us1-4
Copy link
Copy Markdown

datadog-prod-us1-4 Bot commented May 15, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 76.96% (+0.00%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 0ac0d45 | Docs | Datadog PR Page | Give us feedback!

@thomas-lebeau
Copy link
Copy Markdown
Collaborator Author

@codex review

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented May 15, 2026

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 169.51 KiB 169.51 KiB 0 B 0.00%
Rum Profiler 5.97 KiB 5.97 KiB 0 B 0.00%
Rum Recorder 21.23 KiB 21.23 KiB 0 B 0.00%
Logs 54.70 KiB 54.70 KiB 0 B 0.00%
Rum Slim 127.85 KiB 127.85 KiB 0 B 0.00%
Worker 22.99 KiB 22.99 KiB 0 B 0.00%

🔗 RealWorld

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98b9dd4a93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread test/performance/scenarios/apiCalls.scenario.ts Outdated
@thomas-lebeau thomas-lebeau changed the title 👷 port per-API performance tests to benchmark harness 👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size May 15, 2026
Move CPU and memory measurements for individual RUM APIs (addAction,
addError, addTiming, setGlobalContextProperty, startView,
start/stopSessionReplayRecording) from the legacy synthetics-driven
scripts to the Playwright benchmark harness in test/performance.

- Add apiCalls.scenario.ts driving each API via page.evaluate under the
  existing CPU/memory profilers; reuses the instrumentation-overhead app
- Allow createBenchmarkTest to opt into a subset of SDK configurations
- Resolve apiCall_* scenarios to the instrumentation-overhead app in the
  performance server
- Drop scripts/performance/{cpu,memory}Performance.ts, constants.ts, and
  the related PR comment sections — the benchmark job now reports these
  metrics directly
- Rename the test-performance GitLab job to bundle-size to reflect its
  remaining responsibility

👷 widen apiCall benchmark configurations to rum/rum_replay/rum_profiling

Allows the per-API scenarios to surface how enabling session replay or
profiling affects per-call SDK overhead, instead of only measuring the
vanilla `rum` config.

🐛 sample heap after the apiCall workload, not before

`startMemoryProfiling.stopMemoryProfiling()` discards the final
sampling profile and only takes the median of samples captured during
explicit `takeMeasurements()` calls. With the call placed before the
workload, the per-API `browser_sdk.benchmark.memory` metric was just
the post-load baseline.

Move the call to after the workload via a `runApiBenchmark` helper so
the pattern is shared across scenarios.

👷 move per-API CPU/memory tests to a scheduled api-performance-benchmark job

The new `apiCall_*` scenarios under `test/performance/` produced numbers
that aren't order-of-magnitude comparable to the old per-action metrics
(scenario-total vs per-call methodology), so switch approach: keep the
original synthetics-driven CPU test and puppeteer-driven memory test,
strip the PR-comparison parts, and run them in a new scheduled GitLab
job gated by `TARGET_TASK_NAME == "performance-benchmark-scheduled"`.

- Add `scripts/api-performance/` with trimmed CPU + memory scripts and
  their own copy of `reportToDatadog` (lint forbids cross-script
  protected-directory imports)
- Add the `api-performance-benchmark` job in `.gitlab-ci.yml`
- Revert the per-API scenarios and the `configurations` option I added
  to `createBenchmarkTest`; revert the `apiCall_*` fallback in
  `server.ts`

👷 log per-API CPU and memory tables to the CI console

After the synthetics CPU test finishes, query the just-reported metrics
from Datadog and print a small table. For memory, print the same shape
of table inline from the values the script already computed. No
comparison against a base commit — just current results.
Both copies were byte-identical. Move it to `scripts/lib/` so the two
scripts directories import the same module instead of maintaining
duplicates.
@thomas-lebeau thomas-lebeau force-pushed the worktree-port-perf-tests-to-benchmark branch from 1a2c7a2 to 0ac0d45 Compare May 16, 2026 07:03
@thomas-lebeau thomas-lebeau marked this pull request as ready for review May 18, 2026 05:45
@thomas-lebeau thomas-lebeau requested a review from a team as a code owner May 18, 2026 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants