Skip to content

Improved backend functionality#9570

Draft
inner-daemons wants to merge 54 commits into
gfx-rs:trunkfrom
inner-daemons:c-backend
Draft

Improved backend functionality#9570
inner-daemons wants to merge 54 commits into
gfx-rs:trunkfrom
inner-daemons:c-backend

Conversation

@inner-daemons
Copy link
Copy Markdown
Collaborator

@inner-daemons inner-daemons commented May 20, 2026

Connections
Related to gfx-rs/wgpu-native#594

Description
This improves the custom backend functionality so that a proper custom backend can be fully implemented.

The controversial change here is the introduction of global state to override instance creation.

Testing
Existing testing + tested in gfx-rs/wgpu-native#594

Squash or Rebase?
Squash

Checklist

  • I self-reviewed and fully understand this PR.
  • WebGPU implementations built with wgpu may be affected behaviorally.
  • Validation and feature gates are in place to confine behavioral changes.
  • Tests demonstrate the validation and altered logic works.
  • CHANGELOG.md entries for the user-facing effects of this change are present.
  • The PR is minimal, and doesn't make sense to land as multiple PRs.
  • Commits are logically scoped and individually reviewable.
  • The PR description has enough context to understand the motivation and solution implemented.

inner-daemons and others added 11 commits May 16, 2026 14:12
Add [patch] for the inner-daemons/wgpu-native git URL so Cargo uses
the local path dep instead. This makes [patch.crates-io] propagate
to wgpu-native's transitive deps (naga, wgpu-core, etc.), eliminating
the naga 29.0.3/29.0.0 version split that caused MSL generation
differences in subgroup_operations tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
inner-daemons and others added 16 commits May 26, 2026 22:34
- get_internal_counters: map WGPUHalCounters → InternalCounters
- create_blas/create_tlas: build C descriptors and call wgpuDeviceCreate*
- compact_blas: call wgpuQueueCompactBlas, return new handle + DispatchBlas
- mark_acceleration_structures_built: collect ptrs, call C function
- build_acceleration_structures: full BLAS + TLAS support
  - Add Tlas::lowest_unmodified() and TlasInstance::blas_as_custom() cfg(custom)
    accessors to wgpu to allow custom backends to read TLAS build data
- create_bind_group: add AccelerationStructure/BufferArray/SamplerArray/
  TextureViewArray support via WGPUBindGroupEntryExtras chain
- Remove stale println! from instance_create

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… bind group entries

- Wire WGPUQuerySetDescriptorExtras for PipelineStatistics query type
- Add pipeline_statistics_to_native conv fn mapping wgpu flags to C enum values
- Handle AccelerationStructure (TLAS) and array binding resources (BufferArray,
  SamplerArray, TextureViewArray) in create_bind_group via WGPUBindGroupEntryExtras

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Staging buffers:
- Implement CQueueWriteBuffer (CPU Vec flushed via wgpuQueueWriteBuffer)
- create_staging_buffer now returns Some, enabling Queue::write_buffer_with
- validate_write_buffer returns Some(()) — validation happens in wgpu-native
- write_staging_buffer flushes the staged data via wgpuQueueWriteBuffer

Passthrough shaders:
- Extend create_shader_module_passthrough to handle DXIL, HLSL, MetalLib, MSL
  in addition to the existing WGSL and SPIR-V paths
- Only panic with unimplemented! when no format wgpu-native can handle is present
  (i.e. GLSL-only descriptor)

Remaining known stubs with no wgpu-native API: downlevel_capabilities,
wgsl_language_features, generate_allocator_report, poll_all_devices return value,
AccelerationStructureArray bind group entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ureArray in c-backend

- lib.rs: wgsl_language_features now calls wgpuGetWgslLanguageFeatures() and maps
  the returned bitmask to wgpu::WgslLanguageFeatures flags
- surface.rs: texture_discard now calls wgpuSurfaceDiscardTexture() instead of no-op
- device.rs: AccelerationStructureArray binding resource is now fully handled via
  the new tlases/tlasCount fields in WGPUBindGroupEntryExtras; all existing
  ExtrasStorage instantiations updated for the new struct shape

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cator_report

- lib.rs: poll_all_devices now calls wgpuInstancePollAllDevices(ptr, force_wait)
  and returns the real bool result instead of always returning true
- adapter.rs: downlevel_capabilities now calls wgpuAdapterGetDownlevelCapabilities
  and maps all 14 flag bits and ShaderModel (Sm2/Sm4/Sm5) back to wgpu types
- device.rs: generate_allocator_report now calls wgpuDeviceGetAllocatorReport,
  converts the C allocations/blocks arrays into Rust vecs, then frees the C memory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ensures the c-backend factory is linked and registered in all crates
that run wgpu code: examples/features, benches, bug-repro examples,
and standalone examples. Skipped: custom_backend (is its own backend
demo), player (uses wgpu-core directly), cts_runner (uses deno_webgpu).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
inner-daemons and others added 25 commits May 27, 2026 21:36
Panic-in-Drop now logged instead of silently discarded
wgpuDeviceRelease can panic via handle_error_fatal when the device is in
an error state. The catch_unwind was correct (re-panicking in Drop aborts),
but the error was swallowed completely. Now logs via log::error! with the
panic message so failures are visible. Requires adding log to Cargo.toml.

GPU-initiated device loss limitation surfaced at runtime
set_device_lost_callback now emits log::warn! explaining that only
explicit Device::destroy() fires the callback — wgpu-native does not wire
WGPUDeviceLostCallbackInfo to wgpu-core's spontaneous loss path, so driver
crashes and GPU timeouts are silently dropped. The existing code and comment
in adapter.rs were correct; this makes the limitation visible to callers at
the point they register the callback.

Replace AtomicUsize + transmute with OnceLock<fn> in instance.rs
INSTANCE_FACTORY was stored as a usize and recovered via
core::mem::transmute, relying on fn pointers fitting in usize (true in
practice but not guaranteed by the spec). Replaced with
std::sync::OnceLock<fn(InstanceDescriptor) -> Result<Instance,
InstanceDescriptor>>, which is type-safe, requires no unsafe, and naturally
enforces single-registration semantics. Drops the AtomicUsize and Ordering
imports.

Remove #[allow(dead_code)] from conv.rs and delete unused function
The blanket allow masked one genuinely unused function,
map_texture_dimension (native→wgpu TextureDimension, never called). The
other apparent dead-code entries (map_feature, origin3d_to_native) are
internal helpers called within conv.rs itself and are not dead. Removing
the allow lets the compiler enforce this going forward.

Document why finish_boxed cannot have a default impl
finish_boxed exists to call finish through a Box<dyn Trait> vtable.
A default impl that delegates to finish would require Self: Sized to move
out of the box, which removes the method from the vtable and breaks object
safety — a contradictory requirement. The trait method now carries a doc
comment explaining this constraint and shows the required one-liner that
every concrete backend must write.

Explain DynRenderBundleEncoder pointer-based Ord/Hash
The Eq/Ord/Hash impls compare heap addresses rather than values. Added a
block comment before the impls explaining why this is sound (Box<T>
guarantees a stable allocation address for the encoder's lifetime) and why
it is intentional (these impls satisfy dispatch-enum bounds, not semantic
ordering — encoders are never sorted or deduplicated by value).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**Subgroup operations (BACKEND failure)**
Callbacks from WGPUCallbackMode_AllowSpontaneous fire on background threads.
Panic payloads were stored in thread-local storage, so the test thread never
saw them.  Switch CALLBACK_PANIC to a global Mutex so cross-thread panics are
captured.  Add a bounded spin in map_async (up to 50 ms, polling the device)
so the background callback completes and the panic is visible before
resume_callback_panic() is called on the test thread.

**Passthrough layout validation (ALWAYS failure)**
wgpu-core panics when create_render_pipeline receives layout: None with a
passthrough shader module (it cannot reflect the layout).  wgpu-native accepts
this, so the C backend had to replicate the validation explicitly.  Track
is_passthrough on CShaderModule and panic early with a clear message.

**Timestamps encoder hang (ALWAYS timeout)**
CDevice::poll() ignored PollType::Wait { timeout: Some(_) } and called
wgpuDevicePoll(wait=true) which maps to wait_indefinitely() in wgpu-native,
hanging the process.  The new wgpuDevicePollWithTimeout function (added in the
wgpu-native fork) threads a nanosecond timeout through to wgpu-core's
device_poll which already supports bounded waits.  Update Cargo.lock to pull
in the new commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eader

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@inner-daemons inner-daemons changed the title C backend Improved backend functionality May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant