Improved backend functionality#9570
Draft
inner-daemons wants to merge 54 commits into
Draft
Conversation
Add [patch] for the inner-daemons/wgpu-native git URL so Cargo uses the local path dep instead. This makes [patch.crates-io] propagate to wgpu-native's transitive deps (naga, wgpu-core, etc.), eliminating the naga 29.0.3/29.0.0 version split that caused MSL generation differences in subgroup_operations tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6 tasks
- get_internal_counters: map WGPUHalCounters → InternalCounters
- create_blas/create_tlas: build C descriptors and call wgpuDeviceCreate*
- compact_blas: call wgpuQueueCompactBlas, return new handle + DispatchBlas
- mark_acceleration_structures_built: collect ptrs, call C function
- build_acceleration_structures: full BLAS + TLAS support
- Add Tlas::lowest_unmodified() and TlasInstance::blas_as_custom() cfg(custom)
accessors to wgpu to allow custom backends to read TLAS build data
- create_bind_group: add AccelerationStructure/BufferArray/SamplerArray/
TextureViewArray support via WGPUBindGroupEntryExtras chain
- Remove stale println! from instance_create
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… bind group entries - Wire WGPUQuerySetDescriptorExtras for PipelineStatistics query type - Add pipeline_statistics_to_native conv fn mapping wgpu flags to C enum values - Handle AccelerationStructure (TLAS) and array binding resources (BufferArray, SamplerArray, TextureViewArray) in create_bind_group via WGPUBindGroupEntryExtras Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Staging buffers: - Implement CQueueWriteBuffer (CPU Vec flushed via wgpuQueueWriteBuffer) - create_staging_buffer now returns Some, enabling Queue::write_buffer_with - validate_write_buffer returns Some(()) — validation happens in wgpu-native - write_staging_buffer flushes the staged data via wgpuQueueWriteBuffer Passthrough shaders: - Extend create_shader_module_passthrough to handle DXIL, HLSL, MetalLib, MSL in addition to the existing WGSL and SPIR-V paths - Only panic with unimplemented! when no format wgpu-native can handle is present (i.e. GLSL-only descriptor) Remaining known stubs with no wgpu-native API: downlevel_capabilities, wgsl_language_features, generate_allocator_report, poll_all_devices return value, AccelerationStructureArray bind group entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ureArray in c-backend - lib.rs: wgsl_language_features now calls wgpuGetWgslLanguageFeatures() and maps the returned bitmask to wgpu::WgslLanguageFeatures flags - surface.rs: texture_discard now calls wgpuSurfaceDiscardTexture() instead of no-op - device.rs: AccelerationStructureArray binding resource is now fully handled via the new tlases/tlasCount fields in WGPUBindGroupEntryExtras; all existing ExtrasStorage instantiations updated for the new struct shape Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cator_report - lib.rs: poll_all_devices now calls wgpuInstancePollAllDevices(ptr, force_wait) and returns the real bool result instead of always returning true - adapter.rs: downlevel_capabilities now calls wgpuAdapterGetDownlevelCapabilities and maps all 14 flag bits and ShaderModel (Sm2/Sm4/Sm5) back to wgpu types - device.rs: generate_allocator_report now calls wgpuDeviceGetAllocatorReport, converts the C allocations/blocks arrays into Rust vecs, then frees the C memory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ensures the c-backend factory is linked and registered in all crates that run wgpu code: examples/features, benches, bug-repro examples, and standalone examples. Skipped: custom_backend (is its own backend demo), player (uses wgpu-core directly), cts_runner (uses deno_webgpu). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Panic-in-Drop now logged instead of silently discarded wgpuDeviceRelease can panic via handle_error_fatal when the device is in an error state. The catch_unwind was correct (re-panicking in Drop aborts), but the error was swallowed completely. Now logs via log::error! with the panic message so failures are visible. Requires adding log to Cargo.toml. GPU-initiated device loss limitation surfaced at runtime set_device_lost_callback now emits log::warn! explaining that only explicit Device::destroy() fires the callback — wgpu-native does not wire WGPUDeviceLostCallbackInfo to wgpu-core's spontaneous loss path, so driver crashes and GPU timeouts are silently dropped. The existing code and comment in adapter.rs were correct; this makes the limitation visible to callers at the point they register the callback. Replace AtomicUsize + transmute with OnceLock<fn> in instance.rs INSTANCE_FACTORY was stored as a usize and recovered via core::mem::transmute, relying on fn pointers fitting in usize (true in practice but not guaranteed by the spec). Replaced with std::sync::OnceLock<fn(InstanceDescriptor) -> Result<Instance, InstanceDescriptor>>, which is type-safe, requires no unsafe, and naturally enforces single-registration semantics. Drops the AtomicUsize and Ordering imports. Remove #[allow(dead_code)] from conv.rs and delete unused function The blanket allow masked one genuinely unused function, map_texture_dimension (native→wgpu TextureDimension, never called). The other apparent dead-code entries (map_feature, origin3d_to_native) are internal helpers called within conv.rs itself and are not dead. Removing the allow lets the compiler enforce this going forward. Document why finish_boxed cannot have a default impl finish_boxed exists to call finish through a Box<dyn Trait> vtable. A default impl that delegates to finish would require Self: Sized to move out of the box, which removes the method from the vtable and breaks object safety — a contradictory requirement. The trait method now carries a doc comment explaining this constraint and shows the required one-liner that every concrete backend must write. Explain DynRenderBundleEncoder pointer-based Ord/Hash The Eq/Ord/Hash impls compare heap addresses rather than values. Added a block comment before the impls explaining why this is sound (Box<T> guarantees a stable allocation address for the encoder's lifetime) and why it is intentional (these impls satisfy dispatch-enum bounds, not semantic ordering — encoders are never sorted or deduplicated by value). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**Subgroup operations (BACKEND failure)**
Callbacks from WGPUCallbackMode_AllowSpontaneous fire on background threads.
Panic payloads were stored in thread-local storage, so the test thread never
saw them. Switch CALLBACK_PANIC to a global Mutex so cross-thread panics are
captured. Add a bounded spin in map_async (up to 50 ms, polling the device)
so the background callback completes and the panic is visible before
resume_callback_panic() is called on the test thread.
**Passthrough layout validation (ALWAYS failure)**
wgpu-core panics when create_render_pipeline receives layout: None with a
passthrough shader module (it cannot reflect the layout). wgpu-native accepts
this, so the C backend had to replicate the validation explicitly. Track
is_passthrough on CShaderModule and panic early with a clear message.
**Timestamps encoder hang (ALWAYS timeout)**
CDevice::poll() ignored PollType::Wait { timeout: Some(_) } and called
wgpuDevicePoll(wait=true) which maps to wait_indefinitely() in wgpu-native,
hanging the process. The new wgpuDevicePollWithTimeout function (added in the
wgpu-native fork) threads a nanosecond timeout through to wgpu-core's
device_poll which already supports bounded waits. Update Cargo.lock to pull
in the new commit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eader Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This was referenced May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Connections
Related to gfx-rs/wgpu-native#594
Description
This improves the custom backend functionality so that a proper custom backend can be fully implemented.
The controversial change here is the introduction of global state to override instance creation.
Testing
Existing testing + tested in gfx-rs/wgpu-native#594
Squash or Rebase?
Squash
Checklist
wgpumay be affected behaviorally.CHANGELOG.mdentries for the user-facing effects of this change are present.