Improved backend functionality by inner-daemons · Pull Request #9570 · gfx-rs/wgpu

inner-daemons · 2026-05-20T07:37:20Z

Connections
Related to gfx-rs/wgpu-native#594

Description
This improves the custom backend functionality so that a proper custom backend can be fully implemented.

The controversial change here is the introduction of global state to override instance creation.

Testing
Existing testing + tested in gfx-rs/wgpu-native#594

Squash or Rebase?
Squash

Checklist

I self-reviewed and fully understand this PR.
WebGPU implementations built with wgpu may be affected behaviorally.
Validation and feature gates are in place to confine behavioral changes.
Tests demonstrate the validation and altered logic works.
CHANGELOG.md entries for the user-facing effects of this change are present.
The PR is minimal, and doesn't make sense to land as multiple PRs.
Commits are logically scoped and individually reviewable.
The PR description has enough context to understand the motivation and solution implemented.

Add [patch] for the inner-daemons/wgpu-native git URL so Cargo uses the local path dep instead. This makes [patch.crates-io] propagate to wgpu-native's transitive deps (naga, wgpu-core, etc.), eliminating the naga 29.0.3/29.0.0 version split that caused MSL generation differences in subgroup_operations tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…broken

- get_internal_counters: map WGPUHalCounters → InternalCounters - create_blas/create_tlas: build C descriptors and call wgpuDeviceCreate* - compact_blas: call wgpuQueueCompactBlas, return new handle + DispatchBlas - mark_acceleration_structures_built: collect ptrs, call C function - build_acceleration_structures: full BLAS + TLAS support - Add Tlas::lowest_unmodified() and TlasInstance::blas_as_custom() cfg(custom) accessors to wgpu to allow custom backends to read TLAS build data - create_bind_group: add AccelerationStructure/BufferArray/SamplerArray/ TextureViewArray support via WGPUBindGroupEntryExtras chain - Remove stale println! from instance_create Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… bind group entries - Wire WGPUQuerySetDescriptorExtras for PipelineStatistics query type - Add pipeline_statistics_to_native conv fn mapping wgpu flags to C enum values - Handle AccelerationStructure (TLAS) and array binding resources (BufferArray, SamplerArray, TextureViewArray) in create_bind_group via WGPUBindGroupEntryExtras Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Staging buffers: - Implement CQueueWriteBuffer (CPU Vec flushed via wgpuQueueWriteBuffer) - create_staging_buffer now returns Some, enabling Queue::write_buffer_with - validate_write_buffer returns Some(()) — validation happens in wgpu-native - write_staging_buffer flushes the staged data via wgpuQueueWriteBuffer Passthrough shaders: - Extend create_shader_module_passthrough to handle DXIL, HLSL, MetalLib, MSL in addition to the existing WGSL and SPIR-V paths - Only panic with unimplemented! when no format wgpu-native can handle is present (i.e. GLSL-only descriptor) Remaining known stubs with no wgpu-native API: downlevel_capabilities, wgsl_language_features, generate_allocator_report, poll_all_devices return value, AccelerationStructureArray bind group entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ureArray in c-backend - lib.rs: wgsl_language_features now calls wgpuGetWgslLanguageFeatures() and maps the returned bitmask to wgpu::WgslLanguageFeatures flags - surface.rs: texture_discard now calls wgpuSurfaceDiscardTexture() instead of no-op - device.rs: AccelerationStructureArray binding resource is now fully handled via the new tlases/tlasCount fields in WGPUBindGroupEntryExtras; all existing ExtrasStorage instantiations updated for the new struct shape Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…cator_report - lib.rs: poll_all_devices now calls wgpuInstancePollAllDevices(ptr, force_wait) and returns the real bool result instead of always returning true - adapter.rs: downlevel_capabilities now calls wgpuAdapterGetDownlevelCapabilities and maps all 14 flag bits and ShaderModel (Sm2/Sm4/Sm5) back to wgpu types - device.rs: generate_allocator_report now calls wgpuDeviceGetAllocatorReport, converts the C allocations/blocks arrays into Rust vecs, then frees the C memory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Ensures the c-backend factory is linked and registered in all crates that run wgpu code: examples/features, benches, bug-repro examples, and standalone examples. Skipped: custom_backend (is its own backend demo), player (uses wgpu-core directly), cts_runner (uses deno_webgpu). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Panic-in-Drop now logged instead of silently discarded wgpuDeviceRelease can panic via handle_error_fatal when the device is in an error state. The catch_unwind was correct (re-panicking in Drop aborts), but the error was swallowed completely. Now logs via log::error! with the panic message so failures are visible. Requires adding log to Cargo.toml. GPU-initiated device loss limitation surfaced at runtime set_device_lost_callback now emits log::warn! explaining that only explicit Device::destroy() fires the callback — wgpu-native does not wire WGPUDeviceLostCallbackInfo to wgpu-core's spontaneous loss path, so driver crashes and GPU timeouts are silently dropped. The existing code and comment in adapter.rs were correct; this makes the limitation visible to callers at the point they register the callback. Replace AtomicUsize + transmute with OnceLock<fn> in instance.rs INSTANCE_FACTORY was stored as a usize and recovered via core::mem::transmute, relying on fn pointers fitting in usize (true in practice but not guaranteed by the spec). Replaced with std::sync::OnceLock<fn(InstanceDescriptor) -> Result<Instance, InstanceDescriptor>>, which is type-safe, requires no unsafe, and naturally enforces single-registration semantics. Drops the AtomicUsize and Ordering imports. Remove #[allow(dead_code)] from conv.rs and delete unused function The blanket allow masked one genuinely unused function, map_texture_dimension (native→wgpu TextureDimension, never called). The other apparent dead-code entries (map_feature, origin3d_to_native) are internal helpers called within conv.rs itself and are not dead. Removing the allow lets the compiler enforce this going forward. Document why finish_boxed cannot have a default impl finish_boxed exists to call finish through a Box<dyn Trait> vtable. A default impl that delegates to finish would require Self: Sized to move out of the box, which removes the method from the vtable and breaks object safety — a contradictory requirement. The trait method now carries a doc comment explaining this constraint and shows the required one-liner that every concrete backend must write. Explain DynRenderBundleEncoder pointer-based Ord/Hash The Eq/Ord/Hash impls compare heap addresses rather than values. Added a block comment before the impls explaining why this is sound (Box<T> guarantees a stable allocation address for the encoder's lifetime) and why it is intentional (these impls satisfy dispatch-enum bounds, not semantic ordering — encoders are never sorted or deduplicated by value). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

**Subgroup operations (BACKEND failure)** Callbacks from WGPUCallbackMode_AllowSpontaneous fire on background threads. Panic payloads were stored in thread-local storage, so the test thread never saw them. Switch CALLBACK_PANIC to a global Mutex so cross-thread panics are captured. Add a bounded spin in map_async (up to 50 ms, polling the device) so the background callback completes and the panic is visible before resume_callback_panic() is called on the test thread. **Passthrough layout validation (ALWAYS failure)** wgpu-core panics when create_render_pipeline receives layout: None with a passthrough shader module (it cannot reflect the layout). wgpu-native accepts this, so the C backend had to replicate the validation explicitly. Track is_passthrough on CShaderModule and panic early with a clear message. **Timestamps encoder hang (ALWAYS timeout)** CDevice::poll() ignored PollType::Wait { timeout: Some(_) } and called wgpuDevicePoll(wait=true) which maps to wait_indefinitely() in wgpu-native, hanging the process. The new wgpuDevicePollWithTimeout function (added in the wgpu-native fork) threads a nanosecond timeout through to wgpu-core's device_poll which already supports bounded waits. Update Cargo.lock to pull in the new commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eader Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

inner-daemons and others added 11 commits May 16, 2026 14:12

It looks ok

a6990f4

Most tests pass??

6cc5e16

Apparently passes more tests

c6c9ba7

Logs

90833d7

Fixed cargo stuff

8c9c7e9

Fixed it up again

18f1c3a

Uggh finally

9e9fa8b

Claude claims it passes tests, I haven't verified if the harness was …

d373dbc

…broken

Removed fail_logs

cf7a1aa

Removed local patch

a6016c9

inner-daemons mentioned this pull request May 26, 2026

Complete parity with wgpu gfx-rs/wgpu-native#594

Draft

6 tasks

Add env var to disable custom backend

0e87ca6

inner-daemons mentioned this pull request May 26, 2026

Ability to override instance factory #9595

Open

inner-daemons and others added 16 commits May 26, 2026 22:34

Thing

36ddf3c

Fixed more stuff

d2d3ed8

FIxed test

c689cc3

Update Cargo.lock

e7c30c7

Make more examples use the c backend

a4a8cef

Fied a bunch of random issues

5c247d4

Fixed clippy things

52db16d

Fixed a windowing issue

4f29382

Have we done it?

d668911

Audit again

275f9bc

inner-daemons and others added 25 commits May 27, 2026 21:36

Update cargo.lock

a37ffdb

Fixed compile errors

80da4ca

Prettier TM readme

52bfd38

Change ctor dep

3e6e7e4

Reset cargo.lock

943c558

Update Cargo.lock: wgpu-native exposes wgpuDevicePollWithTimeout in h…

841688c

…eader Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update wgpuDevicePoll call sites for new timeout_ns parameter

ab26c29

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Updated to latest wgpu-native

ce5cb31

Small build fix

0dfb91b

Dynamically link wgpu-core and wgpu

bea4061

Updated how the test works

f15e414

Moved wgpu-c-backend out

b6fb18d

Remove unnecessary table

683e98c

Reverted some stuff

c9c6232

Fixed doc issue

41dacc3

Fixed another warning

3be1a53

Merge branch 'trunk' into c-backend

6992775

Fix deny issue

84b3fc7

Fixed some things

89c6006

Merge branch 'trunk' into c-backend

26744d4

Cleaned some stuff up

67606ed

Some human cleanup

2ede015

Undo cargo.lock changes

f7106cf

inner-daemons changed the title ~~C backend~~ Improved backend functionality May 28, 2026

Updated to use atomics to avoid requirement of std

9cb0588

This was referenced May 28, 2026

Improve custom backend capabilities #9605

Open

Allow overriding Instance::new #9606

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved backend functionality#9570

Improved backend functionality#9570
inner-daemons wants to merge 54 commits into
gfx-rs:trunkfrom
inner-daemons:c-backend

inner-daemons commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

inner-daemons commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

inner-daemons commented May 20, 2026 •

edited

Loading