Skip to content

Commit 51444cf

Browse files
committed
Converted class-level documentation back to Doxygen
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
1 parent 112e9da commit 51444cf

File tree

6 files changed

+198
-253
lines changed

6 files changed

+198
-253
lines changed

docs/sphinx/api/qec/realtime_pipeline_api.rst

Lines changed: 25 additions & 206 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,9 @@
33
Realtime Pipeline API
44
=====================
55

6-
The realtime pipeline API provides a framework for building low-latency QEC
7-
decoding pipelines that combine GPU inference (e.g. TensorRT) with CPU
8-
post-processing (e.g. PyMatching MWPM). All types live in the
9-
``cudaq::qec::realtime::experimental`` namespace and are declared in
6+
The realtime pipeline API provides the reusable host-side runtime for
7+
low-latency QEC pipelines that combine GPU inference with optional CPU
8+
post-processing. The published reference is generated from
109
``cudaq/qec/realtime/pipeline.h``.
1110

1211
.. note::
@@ -17,237 +16,57 @@ post-processing (e.g. PyMatching MWPM). All types live in the
1716
Configuration
1817
-------------
1918

20-
.. class:: core_pinning
19+
.. doxygenstruct:: cudaq::qec::realtime::experimental::core_pinning
20+
:members:
2121

22-
CPU core affinity settings for pipeline threads.
23-
24-
:param dispatcher: Core for the host dispatcher thread (-1 to disable pinning).
25-
:param consumer: Core for the consumer (completion) thread (-1 to disable pinning).
26-
:param worker_base: Base core for worker threads. Workers pin to
27-
base, base+1, etc. (-1 to disable pinning).
28-
29-
30-
.. class:: pipeline_stage_config
31-
32-
Configuration for a single pipeline stage.
33-
34-
:param num_workers: Number of GPU worker threads (max 64). Default: 8.
35-
:param num_slots: Number of ring buffer slots. Default: 32.
36-
:param slot_size: Size of each ring buffer slot in bytes. Default: 16384.
37-
:param cores: CPU core affinity settings (``core_pinning``).
38-
:param external_ringbuffer: When non-null, the pipeline uses this
39-
caller-owned ring buffer (``cudaq_ringbuffer_t*``) instead of
40-
allocating its own. The caller is responsible for lifetime.
41-
``ring_buffer_injector`` is unavailable in this mode.
22+
.. doxygenstruct:: cudaq::qec::realtime::experimental::pipeline_stage_config
23+
:members:
4224

4325

4426
GPU Stage
4527
---------
4628

47-
.. class:: gpu_worker_resources
48-
49-
Per-worker GPU resources returned by the ``gpu_stage_factory``.
50-
51-
Each worker owns a captured CUDA graph, a dedicated stream, and optional
52-
pre/post launch callbacks for DMA staging or result extraction.
53-
54-
:param graph_exec: Instantiated CUDA graph (``cudaGraphExec_t``).
55-
:param stream: Dedicated CUDA stream (``cudaStream_t``).
56-
:param pre_launch_fn: Optional callback invoked before graph launch.
57-
:param pre_launch_data: Opaque user data for ``pre_launch_fn``.
58-
:param post_launch_fn: Optional callback invoked after graph launch.
59-
:param post_launch_data: Opaque user data for ``post_launch_fn``.
60-
:param function_id: RPC function ID that this worker handles.
61-
:param user_context: Opaque user context passed to the CPU stage callback.
62-
29+
.. doxygenstruct:: cudaq::qec::realtime::experimental::gpu_worker_resources
30+
:members:
6331

64-
.. type:: gpu_stage_factory
65-
66-
``std::function<gpu_worker_resources(int worker_id)>``
67-
68-
Factory called once per worker during ``start()``. Returns the GPU
69-
resources for the given worker index.
32+
.. doxygentypedef:: cudaq::qec::realtime::experimental::gpu_stage_factory
7033

7134

7235
CPU Stage
7336
---------
7437

75-
.. class:: cpu_stage_context
76-
77-
Context passed to the CPU stage callback for each completed GPU workload.
78-
79-
:param worker_id: Index of the worker thread.
80-
:param origin_slot: Ring buffer slot that originated this request.
81-
:param gpu_output: Pointer to GPU inference output (nullptr in poll mode).
82-
:param gpu_output_size: Size of GPU output in bytes.
83-
:param response_buffer: Destination buffer for the RPC response.
84-
:param max_response_size: Maximum bytes writable to ``response_buffer``.
85-
:param user_context: Opaque context from ``gpu_worker_resources``.
38+
.. doxygenstruct:: cudaq::qec::realtime::experimental::cpu_stage_context
39+
:members:
8640

41+
.. doxygentypedef:: cudaq::qec::realtime::experimental::cpu_stage_callback
8742

88-
.. type:: cpu_stage_callback
89-
90-
``std::function<size_t(const cpu_stage_context &ctx)>``
91-
92-
Returns the number of bytes written into ``response_buffer``. Special
93-
return values:
94-
95-
- **0**: No GPU result ready yet; the pipeline will poll again.
96-
- **DEFERRED_COMPLETION** (``SIZE_MAX``): Release the worker immediately
97-
but defer slot completion. The caller must call
98-
``realtime_pipeline::complete_deferred(slot)`` once the deferred work
99-
finishes.
43+
.. doxygenvariable:: cudaq::qec::realtime::experimental::DEFERRED_COMPLETION
10044

10145

10246
Completion
10347
----------
10448

105-
.. class:: completion
106-
107-
Metadata for a completed (or errored) pipeline request.
49+
.. doxygenstruct:: cudaq::qec::realtime::experimental::completion
50+
:members:
10851

109-
:param request_id: Original request ID from the RPC header.
110-
:param slot: Ring buffer slot that held this request.
111-
:param success: True if the request completed without CUDA errors.
112-
:param cuda_error: CUDA error code (0 on success).
113-
114-
115-
.. type:: completion_callback
116-
117-
``std::function<void(const completion &c)>``
118-
119-
Invoked by the consumer thread for each completed or errored request.
52+
.. doxygentypedef:: cudaq::qec::realtime::experimental::completion_callback
12053

12154

12255
Ring Buffer Injector
12356
--------------------
12457

125-
.. class:: ring_buffer_injector
126-
127-
Writes RPC-framed requests into the pipeline's ring buffer, simulating
128-
FPGA DMA deposits. Created via ``realtime_pipeline::create_injector()``.
129-
The parent ``realtime_pipeline`` must outlive the injector.
130-
131-
Not available when the pipeline is configured with an external ring buffer
132-
(``pipeline_stage_config::external_ringbuffer != nullptr``).
133-
134-
.. method:: bool try_submit(uint32_t function_id, const void *payload, size_t payload_size, uint64_t request_id)
135-
136-
Try to submit a request without blocking.
137-
138-
:param function_id: RPC function identifier.
139-
:param payload: Pointer to payload data.
140-
:param payload_size: Payload size in bytes.
141-
:param request_id: Caller-assigned request identifier.
142-
:return: True if accepted, false if all slots are busy.
143-
144-
.. method:: void submit(uint32_t function_id, const void *payload, size_t payload_size, uint64_t request_id)
145-
146-
Submit a request, spinning until a slot becomes available.
147-
148-
:param function_id: RPC function identifier.
149-
:param payload: Pointer to payload data.
150-
:param payload_size: Payload size in bytes.
151-
:param request_id: Caller-assigned request identifier.
152-
153-
.. method:: uint64_t backpressure_stalls() const
154-
155-
:return: Cumulative number of times ``submit()`` had to spin-wait.
58+
.. doxygenclass:: cudaq::qec::realtime::experimental::ring_buffer_injector
59+
:members:
15660

15761

15862
Pipeline
15963
--------
16064

161-
.. class:: realtime_pipeline
162-
163-
Orchestrates GPU inference and CPU post-processing for low-latency
164-
realtime QEC decoding.
165-
166-
The pipeline manages a ring buffer, a host dispatcher thread, per-worker
167-
GPU streams with captured CUDA graphs, optional CPU worker threads, and a
168-
consumer thread for completion signaling. It supports both an internal
169-
ring buffer (for software testing via ``ring_buffer_injector``) and an
170-
external ring buffer (for FPGA RDMA).
171-
172-
**Lifecycle:**
173-
174-
1. Construct with ``pipeline_stage_config``
175-
2. Register callbacks: ``set_gpu_stage()``, ``set_cpu_stage()`` (optional),
176-
``set_completion_handler()`` (optional)
177-
3. Call ``start()`` to spawn threads
178-
4. Submit requests via ``ring_buffer_injector`` or external FPGA DMA
179-
5. Call ``stop()`` to shut down
180-
181-
.. method:: realtime_pipeline(const pipeline_stage_config &config)
182-
183-
Construct a pipeline and allocate ring buffer resources.
184-
185-
:param config: Stage configuration.
186-
187-
.. method:: void set_gpu_stage(gpu_stage_factory factory)
188-
189-
Register the GPU stage factory. Must be called before ``start()``.
190-
191-
:param factory: Callback returning ``gpu_worker_resources`` per worker.
192-
193-
.. method:: void set_cpu_stage(cpu_stage_callback callback)
194-
195-
Register the CPU worker callback. Must be called before ``start()``.
196-
If not set, the pipeline operates in GPU-only mode with completion
197-
signaled via ``cudaLaunchHostFunc``.
198-
199-
:param callback: CPU stage processing function.
200-
201-
.. method:: void set_completion_handler(completion_callback handler)
202-
203-
Register the completion callback. Must be called before ``start()``.
204-
205-
:param handler: Function called for each completed request.
206-
207-
.. method:: void start()
208-
209-
Allocate resources, build dispatcher config, and spawn all threads.
210-
211-
.. method:: void stop()
212-
213-
Signal shutdown, join all threads, and free resources.
214-
215-
.. method:: ring_buffer_injector create_injector()
216-
217-
Create a software injector for testing without FPGA hardware.
218-
219-
:return: A ``ring_buffer_injector`` bound to this pipeline.
220-
:raises std::logic_error: If the pipeline uses an external ring buffer.
221-
222-
.. method:: Stats stats() const
223-
224-
Thread-safe, lock-free statistics snapshot.
225-
226-
:return: Current ``Stats`` struct.
227-
228-
.. method:: void complete_deferred(int slot)
229-
230-
Signal that deferred processing for a slot is complete. Call from any
231-
thread after the CPU stage callback returned ``DEFERRED_COMPLETION``.
232-
233-
:param slot: Ring buffer slot index to complete.
234-
235-
.. method:: ring_buffer_bases ringbuffer_bases() const
236-
237-
:return: Host and device base addresses of the RX data ring.
238-
239-
.. class:: Stats
240-
241-
Pipeline throughput and backpressure statistics.
242-
243-
:param submitted: Total requests submitted to the ring buffer.
244-
:param completed: Total requests that completed (success or error).
245-
:param dispatched: Total packets dispatched by the host dispatcher.
246-
:param backpressure_stalls: Cumulative producer backpressure stalls.
247-
248-
.. class:: ring_buffer_bases
65+
.. doxygenclass:: cudaq::qec::realtime::experimental::realtime_pipeline
66+
:members:
24967

250-
Host and device base addresses of the RX data ring.
68+
.. doxygenstruct:: cudaq::qec::realtime::experimental::realtime_pipeline::Stats
69+
:members:
25170

252-
:param rx_data_host: Host-mapped base pointer.
253-
:param rx_data_dev: Device-mapped base pointer.
71+
.. doxygenstruct:: cudaq::qec::realtime::experimental::realtime_pipeline::ring_buffer_bases
72+
:members:

docs/sphinx/examples_rst/qec/realtime_predecoder_pymatching.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ Changing the Predecoder Model
209209
The ONNX model file for each configuration is set in the ``PipelineConfig``
210210
factory methods in
211211
``libs/qec/unittests/realtime/predecoder_pipeline_common.h``. To use a
212-
different model, edit the ``onnx_filename`` field and rebuild:
212+
different model, edit the ``onnx_filename`` field and rebuild.
213213

214214
.. code-block:: cpp
215215

libs/qec/include/cudaq/qec/realtime/pipeline.h

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,9 @@ struct gpu_worker_resources {
8787
};
8888

8989
/// @brief Factory called once per worker during start().
90-
/// @return GPU resources for the given worker.
90+
/// @param worker_id Zero-based worker index assigned by the pipeline.
91+
/// @return GPU resources for the given worker. Any handles, callbacks, and
92+
/// user data returned here must remain valid until the pipeline stops.
9193
using gpu_stage_factory = std::function<gpu_worker_resources(int worker_id)>;
9294

9395
// ---------------------------------------------------------------------------
@@ -117,8 +119,8 @@ struct cpu_stage_context {
117119
};
118120

119121
/// @brief CPU stage callback type.
120-
///
121-
/// @return Number of bytes written into response_buffer.
122+
/// @param ctx Poll-mode view of the current worker state and response buffer.
123+
/// @return Number of bytes written into @p ctx.response_buffer.
122124
/// Return 0 if no GPU result is ready yet (poll again).
123125
/// Return DEFERRED_COMPLETION to release the worker immediately while
124126
/// deferring slot completion to a later complete_deferred() call.
@@ -147,6 +149,7 @@ struct completion {
147149
};
148150

149151
/// @brief Callback invoked by the consumer thread for each completed request.
152+
/// @param c Metadata for the completed or errored request.
150153
using completion_callback = std::function<void(const completion &c)>;
151154

152155
// ---------------------------------------------------------------------------
@@ -161,8 +164,11 @@ using completion_callback = std::function<void(const completion &c)>;
161164
/// pipeline is configured with an external ring buffer.
162165
class ring_buffer_injector {
163166
public:
167+
/// @brief Destroy the injector state.
164168
~ring_buffer_injector();
169+
/// @brief Move-construct an injector.
165170
ring_buffer_injector(ring_buffer_injector &&) noexcept;
171+
/// @brief Move-assign an injector.
166172
ring_buffer_injector &operator=(ring_buffer_injector &&) noexcept;
167173

168174
ring_buffer_injector(const ring_buffer_injector &) = delete;
@@ -212,7 +218,11 @@ class realtime_pipeline {
212218
public:
213219
/// @brief Construct a pipeline and allocate ring buffer resources.
214220
/// @param config Stage configuration (slots, slot size, workers, etc.).
221+
/// @note Construction allocates the backing ring buffer or binds the
222+
/// caller-provided external ring so @ref ringbuffer_bases can be queried
223+
/// before @ref start.
215224
explicit realtime_pipeline(const pipeline_stage_config &config);
225+
/// @brief Stop the pipeline if needed and release owned resources.
216226
~realtime_pipeline();
217227

218228
realtime_pipeline(const realtime_pipeline &) = delete;
@@ -233,10 +243,15 @@ class realtime_pipeline {
233243
/// completed or errored request.
234244
void set_completion_handler(completion_callback handler);
235245

236-
/// @brief Allocate resources, build dispatcher config, spawn all threads.
246+
/// @brief Allocate resources, build dispatcher config, and spawn all threads.
247+
/// @throws std::logic_error If the GPU stage factory was not registered.
248+
/// @throws std::logic_error If GPU-only mode is requested with an external
249+
/// ring buffer.
237250
void start();
238251

239252
/// @brief Signal shutdown, join all threads, free resources.
253+
/// @note Safe to call multiple times. Subsequent calls are no-ops once the
254+
/// pipeline has fully stopped.
240255
void stop();
241256

242257
/// @brief Create a software injector for testing without FPGA hardware.
@@ -278,6 +293,8 @@ class realtime_pipeline {
278293

279294
/// @brief Return the host and device base addresses of the RX data ring.
280295
/// @return Struct containing both base pointers.
296+
/// @note In external-ring mode these pointers are the caller-provided ring
297+
/// addresses. In internal mode they refer to the owned mapped ring buffer.
281298
ring_buffer_bases ringbuffer_bases() const;
282299

283300
private:

0 commit comments

Comments
 (0)