👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size by thomas-lebeau · Pull Request #4633 · DataDog/browser-sdk

thomas-lebeau · 2026-05-15T12:51:46Z

Motivation

Two unrelated cleanups:

The test-performance GitLab job now only computes bundle sizes (the CPU and memory measurements it used to compute were tightly coupled to the per-PR comment flow). Rename the job to match.
The per-API CPU/memory measurements still have value as a regression signal on main, but they were originally compared against a base commit for every PR. Keep the measurement, drop the comparison, and run it on the existing performance-benchmark-scheduled schedule instead of on every PR.

Changes

Rename CI job test-performance → bundle-size.
Add new CI job api-performance-benchmark, gated by TARGET_TASK_NAME == "performance-benchmark-scheduled" so it shares the cadence of the existing benchmark schedule.
Add scripts/api-performance/:
- lib/cpuPerformance.ts — triggers the Datadog synthetic test that drives the playground (/performance/cpu); the synthetic test reports its per-API CPU metrics to Datadog directly.
- lib/memoryPerformance.ts — launches Chrome via puppeteer, clicks each playground button 30× per API while sampling the heap profile, and reports the median SDK-attributable bytes per API to Datadog.
- lib/constants.ts — the 7-entry TESTS list (RUM addGlobalContext / addAction / addError / addTiming / startView / startStop session-replay-recording, Logs logMessage).
- lib/reportToDatadog.ts — local copy of the helper (lint forbids importing across scripts/*/lib/ directories).
Drop the legacy scripts/performance/lib/{cpuPerformance,memoryPerformance,constants}.ts files and the CPU/Memory sections from the PR comment formatter (+ its spec).

Notes

Earlier iterations of this PR added per-API scenarios under test/performance/scenarios/, but they produced order-of-magnitude-different numbers from the old job (scenario-total vs per-call methodology). Reverted in favor of preserving the original measurement approach.
The new job runs on the same schedule as performance-benchmark — change the cadence in GitLab's CI/CD → Schedules → performance-benchmark-scheduled.
Metrics flow to Datadog: CPU from the synthetic test (its existing dashboards), memory as logs tagged per TESTS.property from scripts/api-performance/lib/memoryPerformance.ts.

Test instructions

Check the next GitLab pipeline: the bundle-size job runs (renamed from test-performance) and posts the bundle-size PR comment as before.
On the next performance-benchmark-scheduled run, the new api-performance-benchmark job runs alongside performance-benchmark and pushes memory metrics to Datadog (look for SDK memory consumption logs from the browser-sdk service / ci env on main).

Checklist

Tested locally (typecheck, lint, script test pass)
Tested on staging
Added unit tests for this change.
Added e2e/integration tests for this change.
Updated documentation and/or relevant AGENTS.md file

datadog-prod-us1-4 · 2026-05-15T12:53:30Z

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 76.96% (+0.00%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 0ac0d45 | Docs | Datadog PR Page | Give us feedback!}

thomas-lebeau · 2026-05-15T13:31:00Z

@codex review

cit-pr-commenter-54b7da · 2026-05-15T13:32:44Z

Bundles Sizes Evolution

📦 Bundle Name	Base Size	Local Size	𝚫%	Status
Rum	169.51 KiB	169.51 KiB	0.00%	✅
Rum Profiler	5.97 KiB	5.97 KiB	0.00%	✅
Rum Recorder	21.23 KiB	21.23 KiB	0.00%	✅
Logs	54.70 KiB	54.70 KiB	0.00%	✅
Rum Slim	127.85 KiB	127.85 KiB	0.00%	✅
Worker	22.99 KiB	22.99 KiB	0.00%	✅

🔗 RealWorld

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98b9dd4a93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Move CPU and memory measurements for individual RUM APIs (addAction, addError, addTiming, setGlobalContextProperty, startView, start/stopSessionReplayRecording) from the legacy synthetics-driven scripts to the Playwright benchmark harness in test/performance. - Add apiCalls.scenario.ts driving each API via page.evaluate under the existing CPU/memory profilers; reuses the instrumentation-overhead app - Allow createBenchmarkTest to opt into a subset of SDK configurations - Resolve apiCall_* scenarios to the instrumentation-overhead app in the performance server - Drop scripts/performance/{cpu,memory}Performance.ts, constants.ts, and the related PR comment sections — the benchmark job now reports these metrics directly - Rename the test-performance GitLab job to bundle-size to reflect its remaining responsibility 👷 widen apiCall benchmark configurations to rum/rum_replay/rum_profiling Allows the per-API scenarios to surface how enabling session replay or profiling affects per-call SDK overhead, instead of only measuring the vanilla `rum` config. 🐛 sample heap after the apiCall workload, not before `startMemoryProfiling.stopMemoryProfiling()` discards the final sampling profile and only takes the median of samples captured during explicit `takeMeasurements()` calls. With the call placed before the workload, the per-API `browser_sdk.benchmark.memory` metric was just the post-load baseline. Move the call to after the workload via a `runApiBenchmark` helper so the pattern is shared across scenarios. 👷 move per-API CPU/memory tests to a scheduled api-performance-benchmark job The new `apiCall_*` scenarios under `test/performance/` produced numbers that aren't order-of-magnitude comparable to the old per-action metrics (scenario-total vs per-call methodology), so switch approach: keep the original synthetics-driven CPU test and puppeteer-driven memory test, strip the PR-comparison parts, and run them in a new scheduled GitLab job gated by `TARGET_TASK_NAME == "performance-benchmark-scheduled"`. - Add `scripts/api-performance/` with trimmed CPU + memory scripts and their own copy of `reportToDatadog` (lint forbids cross-script protected-directory imports) - Add the `api-performance-benchmark` job in `.gitlab-ci.yml` - Revert the per-API scenarios and the `configurations` option I added to `createBenchmarkTest`; revert the `apiCall_*` fallback in `server.ts` 👷 log per-API CPU and memory tables to the CI console After the synthetics CPU test finishes, query the just-reported metrics from Datadog and print a small table. For memory, print the same shape of table inline from the values the script already computed. No comparison against a base commit — just current results.

Revert before merging.

Both copies were byte-identical. Move it to `scripts/lib/` so the two scripts directories import the same module instead of maintaining duplicates.

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread test/performance/scenarios/apiCalls.scenario.ts Outdated

thomas-lebeau changed the title ~~👷 port per-API performance tests to benchmark harness~~ 👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size May 15, 2026

thomas-lebeau added 4 commits May 16, 2026 09:03

👷 TEMP run api-performance-benchmark on this PR for verification

72add2c

Revert before merging.

♻️ share reportToDatadog between performance and api-performance scripts

1a84ac4

Both copies were byte-identical. Move it to `scripts/lib/` so the two scripts directories import the same module instead of maintaining duplicates.

👷 revert temp api-performance-benchmark PR gate

0ac0d45

thomas-lebeau force-pushed the worktree-port-perf-tests-to-benchmark branch from 1a2c7a2 to 0ac0d45 Compare May 16, 2026 07:03

thomas-lebeau marked this pull request as ready for review May 18, 2026 05:45

thomas-lebeau requested a review from a team as a code owner May 18, 2026 05:45

rgaignault approved these changes May 18, 2026

View reviewed changes

mormubis approved these changes May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size#4633

👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size#4633
thomas-lebeau wants to merge 4 commits into
mainfrom
worktree-port-perf-tests-to-benchmark

thomas-lebeau commented May 15, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-4 Bot commented May 15, 2026 •

edited by datadog-prod-us1-6 Bot

Loading

Uh oh!

thomas-lebeau commented May 15, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented May 15, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thomas-lebeau commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Notes

Test instructions

Checklist

Uh oh!

datadog-prod-us1-4 Bot commented May 15, 2026 • edited by datadog-prod-us1-6 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomas-lebeau commented May 15, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bundles Sizes Evolution

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thomas-lebeau commented May 15, 2026 •

edited

Loading

datadog-prod-us1-4 Bot commented May 15, 2026 •

edited by datadog-prod-us1-6 Bot

Loading

cit-pr-commenter-54b7da Bot commented May 15, 2026 •

edited

Loading