perf: non-blocking Arrow Device array release#8390
Conversation
fb107d5 to
b5466b1
Compare
Merging this PR will improve performance by 24.25%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | compare[12] |
115.8 µs | 136.4 µs | -15.08% |
| ❌ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
299.1 µs | 352.1 µs | -15.04% |
| ❌ | Simulation | varbinview_large |
112.9 µs | 131.5 µs | -14.19% |
| ❌ | Simulation | compare[8] |
105 µs | 118 µs | -11.05% |
| ❌ | Simulation | compare[10] |
135.9 µs | 152.7 µs | -11.02% |
| ⚡ | Simulation | compare[48] |
300.6 µs | 213 µs | +41.15% |
| ⚡ | Simulation | compare[50] |
319.2 µs | 227.7 µs | +40.18% |
| ⚡ | Simulation | compare[49] |
317.7 µs | 228.2 µs | +39.24% |
| ⚡ | Simulation | compare[44] |
287.7 µs | 207.5 µs | +38.68% |
| ⚡ | Simulation | compare[46] |
302.5 µs | 218.5 µs | +38.46% |
| ⚡ | Simulation | compare[47] |
309.4 µs | 223.5 µs | +38.4% |
| ⚡ | Simulation | compare[40] |
263.5 µs | 190.7 µs | +38.18% |
| ⚡ | Simulation | compare[44] |
292.4 µs | 212.1 µs | +37.82% |
| ⚡ | Simulation | compare[45] |
301 µs | 218.9 µs | +37.49% |
| ⚡ | Simulation | compare[43] |
287.6 µs | 209.2 µs | +37.47% |
| ⚡ | Simulation | compare[42] |
281.1 µs | 204.5 µs | +37.42% |
| ⚡ | Simulation | compare[40] |
268.3 µs | 195.6 µs | +37.17% |
| ⚡ | Simulation | compare[43] |
292.5 µs | 214.2 µs | +36.57% |
| ⚡ | Simulation | compare[42] |
286 µs | 209.4 µs | +36.56% |
| ⚡ | Simulation | compare[41] |
279.2 µs | 204.5 µs | +36.49% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ad/arrow-device-release-ordering (b1e780d) with develop (e74aac3)
Footnotes
-
10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Queue release-time CUDA frees after the recorded export event without synchronizing the host thread. Signed-off-by: "Alexander Droste" <alexander.droste@protonmail.com> Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
b5466b1 to
b1e780d
Compare
No description provided.