👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size#4633
Open
thomas-lebeau wants to merge 4 commits into
Open
👷 add api-performance-benchmark scheduled job + rename test-performance to bundle-size#4633thomas-lebeau wants to merge 4 commits into
thomas-lebeau wants to merge 4 commits into
Conversation
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 0ac0d45 | Docs | Datadog PR Page | Give us feedback! |
Collaborator
Author
|
@codex review |
Bundles Sizes Evolution
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 98b9dd4a93
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Move CPU and memory measurements for individual RUM APIs (addAction,
addError, addTiming, setGlobalContextProperty, startView,
start/stopSessionReplayRecording) from the legacy synthetics-driven
scripts to the Playwright benchmark harness in test/performance.
- Add apiCalls.scenario.ts driving each API via page.evaluate under the
existing CPU/memory profilers; reuses the instrumentation-overhead app
- Allow createBenchmarkTest to opt into a subset of SDK configurations
- Resolve apiCall_* scenarios to the instrumentation-overhead app in the
performance server
- Drop scripts/performance/{cpu,memory}Performance.ts, constants.ts, and
the related PR comment sections — the benchmark job now reports these
metrics directly
- Rename the test-performance GitLab job to bundle-size to reflect its
remaining responsibility
👷 widen apiCall benchmark configurations to rum/rum_replay/rum_profiling
Allows the per-API scenarios to surface how enabling session replay or
profiling affects per-call SDK overhead, instead of only measuring the
vanilla `rum` config.
🐛 sample heap after the apiCall workload, not before
`startMemoryProfiling.stopMemoryProfiling()` discards the final
sampling profile and only takes the median of samples captured during
explicit `takeMeasurements()` calls. With the call placed before the
workload, the per-API `browser_sdk.benchmark.memory` metric was just
the post-load baseline.
Move the call to after the workload via a `runApiBenchmark` helper so
the pattern is shared across scenarios.
👷 move per-API CPU/memory tests to a scheduled api-performance-benchmark job
The new `apiCall_*` scenarios under `test/performance/` produced numbers
that aren't order-of-magnitude comparable to the old per-action metrics
(scenario-total vs per-call methodology), so switch approach: keep the
original synthetics-driven CPU test and puppeteer-driven memory test,
strip the PR-comparison parts, and run them in a new scheduled GitLab
job gated by `TARGET_TASK_NAME == "performance-benchmark-scheduled"`.
- Add `scripts/api-performance/` with trimmed CPU + memory scripts and
their own copy of `reportToDatadog` (lint forbids cross-script
protected-directory imports)
- Add the `api-performance-benchmark` job in `.gitlab-ci.yml`
- Revert the per-API scenarios and the `configurations` option I added
to `createBenchmarkTest`; revert the `apiCall_*` fallback in
`server.ts`
👷 log per-API CPU and memory tables to the CI console
After the synthetics CPU test finishes, query the just-reported metrics
from Datadog and print a small table. For memory, print the same shape
of table inline from the values the script already computed. No
comparison against a base commit — just current results.
Revert before merging.
Both copies were byte-identical. Move it to `scripts/lib/` so the two scripts directories import the same module instead of maintaining duplicates.
1a2c7a2 to
0ac0d45
Compare
rgaignault
approved these changes
May 18, 2026
mormubis
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Two unrelated cleanups:
test-performanceGitLab job now only computes bundle sizes (the CPU and memory measurements it used to compute were tightly coupled to the per-PR comment flow). Rename the job to match.main, but they were originally compared against a base commit for every PR. Keep the measurement, drop the comparison, and run it on the existingperformance-benchmark-scheduledschedule instead of on every PR.Changes
test-performance→bundle-size.api-performance-benchmark, gated byTARGET_TASK_NAME == "performance-benchmark-scheduled"so it shares the cadence of the existing benchmark schedule.scripts/api-performance/:lib/cpuPerformance.ts— triggers the Datadog synthetic test that drives the playground (/performance/cpu); the synthetic test reports its per-API CPU metrics to Datadog directly.lib/memoryPerformance.ts— launches Chrome via puppeteer, clicks each playground button 30× per API while sampling the heap profile, and reports the median SDK-attributable bytes per API to Datadog.lib/constants.ts— the 7-entryTESTSlist (RUM addGlobalContext / addAction / addError / addTiming / startView / startStop session-replay-recording, Logs logMessage).lib/reportToDatadog.ts— local copy of the helper (lint forbids importing acrossscripts/*/lib/directories).scripts/performance/lib/{cpuPerformance,memoryPerformance,constants}.tsfiles and the CPU/Memory sections from the PR comment formatter (+ its spec).Notes
test/performance/scenarios/, but they produced order-of-magnitude-different numbers from the old job (scenario-total vs per-call methodology). Reverted in favor of preserving the original measurement approach.performance-benchmark— change the cadence in GitLab's CI/CD → Schedules →performance-benchmark-scheduled.TESTS.propertyfromscripts/api-performance/lib/memoryPerformance.ts.Test instructions
bundle-sizejob runs (renamed fromtest-performance) and posts the bundle-size PR comment as before.performance-benchmark-scheduledrun, the newapi-performance-benchmarkjob runs alongsideperformance-benchmarkand pushes memory metrics to Datadog (look for SDK memory consumption logs from thebrowser-sdkservice /cienv on main).Checklist