Skip to content

perf: non-blocking Arrow Device array release#8390

Merged
0ax1 merged 1 commit into
developfrom
ad/arrow-device-release-ordering
Jun 12, 2026
Merged

perf: non-blocking Arrow Device array release#8390
0ax1 merged 1 commit into
developfrom
ad/arrow-device-release-ordering

Conversation

@0ax1

@0ax1 0ax1 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@0ax1 0ax1 added the changelog/performance A performance improvement label Jun 12, 2026
@0ax1 0ax1 force-pushed the ad/arrow-device-release-ordering branch from fb107d5 to b5466b1 Compare June 12, 2026 15:33
@codspeed-hq

codspeed-hq Bot commented Jun 12, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 24.25%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 111 improved benchmarks
❌ 5 regressed benchmarks
✅ 1412 untouched benchmarks
⏩ 10 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation compare[12] 115.8 µs 136.4 µs -15.08%
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 299.1 µs 352.1 µs -15.04%
Simulation varbinview_large 112.9 µs 131.5 µs -14.19%
Simulation compare[8] 105 µs 118 µs -11.05%
Simulation compare[10] 135.9 µs 152.7 µs -11.02%
Simulation compare[48] 300.6 µs 213 µs +41.15%
Simulation compare[50] 319.2 µs 227.7 µs +40.18%
Simulation compare[49] 317.7 µs 228.2 µs +39.24%
Simulation compare[44] 287.7 µs 207.5 µs +38.68%
Simulation compare[46] 302.5 µs 218.5 µs +38.46%
Simulation compare[47] 309.4 µs 223.5 µs +38.4%
Simulation compare[40] 263.5 µs 190.7 µs +38.18%
Simulation compare[44] 292.4 µs 212.1 µs +37.82%
Simulation compare[45] 301 µs 218.9 µs +37.49%
Simulation compare[43] 287.6 µs 209.2 µs +37.47%
Simulation compare[42] 281.1 µs 204.5 µs +37.42%
Simulation compare[40] 268.3 µs 195.6 µs +37.17%
Simulation compare[43] 292.5 µs 214.2 µs +36.57%
Simulation compare[42] 286 µs 209.4 µs +36.56%
Simulation compare[41] 279.2 µs 204.5 µs +36.49%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ad/arrow-device-release-ordering (b1e780d) with develop (e74aac3)

Open in CodSpeed

Footnotes

  1. 10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Queue release-time CUDA frees after the recorded export event without synchronizing the host thread.

Signed-off-by: "Alexander Droste" <alexander.droste@protonmail.com>

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 force-pushed the ad/arrow-device-release-ordering branch from b5466b1 to b1e780d Compare June 12, 2026 15:34
@0ax1 0ax1 marked this pull request as ready for review June 12, 2026 15:35
@0ax1 0ax1 requested review from a team, onursatici and robert3005 June 12, 2026 15:35
@0ax1 0ax1 merged commit 46e7253 into develop Jun 12, 2026
67 of 69 checks passed
@0ax1 0ax1 deleted the ad/arrow-device-release-ordering branch June 12, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants