Opschema metadata by mzient · Pull Request #6280 · NVIDIA/DALI

mzient · 2026-04-03T18:03:58Z

Co-authored-by: Rostan Tabet rtabet@nvidia.com

Category:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Description:

This change adds static metadata inference (ndim, layout, dtype) to OpSchema. Most operators can infer it from OpSpec.
OpSpec now carries the statically inferred metadata.
Actual inputs and outputs, as seen in the workspace, are now automatically validated against OpSpec in OperatorBase.

There's a default policy for handling metadata - it's opt-in, but can be enabled for all schemas declared with DALI_SCHEMA if DALI_SCHEMA_DEFAULT_METADATA_POLICY is defined and set to nonzero. This is true for DALI project, so all internal operators implement the default policy and need to either opt-out or override it if they don't conform.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

greptile-apps · 2026-04-03T18:21:27Z

Greptile Summary

This PR adds static metadata inference (ndim, dtype, layout) to OpSchema and OpSpec, propagates it through the op graph during Pipeline::Build(), and automatically validates actual inputs/outputs against the inferred descriptors in OperatorBase::Setup and Run. The scope is large (91 files) but the architectural approach — lazy-cached per-output callbacks with schema inheritance, DFS propagation in node_meta.cc, and a __debug escape-hatch for eager operators — is sound.

P1 (expand_dims.cc): The new OutputLayout schema inference lambda does not guard against negative axis values. ComputeDataNodeMetadata runs before the operator constructor, so DALI_ENFORCE(0 <= axis) never fires first; a negative axis loops past all input dims and hits assert(src_axis < ndim) or UB. one_hot.cc in the same PR handles this correctly with a return nullopt guard.

Confidence Score: 4/5

The PR is mostly safe but has one P1 defect in expand_dims schema inference that should be fixed before merging.

All prior review concerns (prev_c_idx, dead output_dtype_fn, label typos, join axis sign) are tracked in existing threads. One new P1 issue was found: negative axes in the expand_dims OutputLayout lambda trigger an assertion / undefined behaviour during Pipeline::Build() before the operator's own validation can fire. The remaining finding (dead variable input_idx) is P2. Score 4 reflects the single outstanding P1.

dali/operators/generic/expand_dims.cc (OutputLayout lambda, negative axis handling)

Important Files Changed

Filename	Overview
dali/pipeline/operator/op_schema.cc	Core metadata inference logic (CalculateOutputDType/NDim/Layout, GetCorrespondingExpandedOutputLayout); contains dead variable `input_idx` and the `prev_c_idx`-after-insert ordering bug noted in prior threads
dali/pipeline/operator/op_schema.h	Introduces OutputDTypeFunc/OutputNDimFunc/OutputLayoutFunc aliases, lazy-cached flattened function vectors, and the UseDefaultMetadataPolicy / AutoExpandDims builder API
dali/pipeline/operator/op_spec.cc	AddInput/AddArgumentInput extended with optional metadata; InferOutputMetadata() delegates to schema Calculate* methods
dali/pipeline/operator/op_spec.h	InOutDesc gains ndim/dtype/layout fields; output_name_idx_ map key is InOutDesc but keys are always inserted with nullopt metadata so heterogeneous (name,device) lookup remains correct
dali/pipeline/operator/operator.cc	ValidateInputMetadata/ValidateOutputMetadata added; skips empty batches, honours __debug flag for eager / debug-mode operators
dali/pipeline/operator/operator.h	Setup and Run gain an optional validate_metadata parameter (default true); SequenceOperator overrides pass it down correctly, skipping validation for the inner expanded workspace
dali/pipeline/graph/node_meta.cc	New file: DFS propagation of producer output metadata into consumer input descriptors, then InferOutputMetadata per node
dali/pipeline/pipeline.cc	ComputeDataNodeMetadata inserted before executor build — correct placement in the build sequence
dali/pipeline/operator/eager_operator.h	Adds __debug=true to bypass validation; intentional since eager ops skip the full graph-build / metadata-inference pass
dali/operators/generic/expand_dims.cc	OutputLayout lambda does not normalise negative axes; negative values cause assert/UB in schema inference before the operator's own DALI_ENFORCE fires
dali/operators/generic/one_hot.cc	OutputNDim/OutputLayout lambdas correctly normalise negative axis (axis += ndim+1) and guard unknown ndim/layout with nullopt
dali/python/nvidia/dali/data_node.py	DataNode gains ndim/dtype/layout sourced from OpSpec.OutputDesc at construction; fields are None until InferOutputMetadata runs, which is fine for the graph-mode path
dali/python/nvidia/dali/experimental/dynamic/_invocation.py	ndim/dtype/layout lazily inferred via _init_spec before falling back to full deferred evaluation

Sequence Diagram

sequenceDiagram
    participant Py as Python (pipeline build)
    participant PL as Pipeline::Build()
    participant NM as node_meta::ComputeDataNodeMetadata
    participant OS as OpSchema::Calculate*
    participant EX as Executor::Build (operator ctor)
    participant OP as OperatorBase::Setup/Run

    Py->>PL: Build(output_descs)
    PL->>NM: ComputeDataNodeMetadata(graph)
    loop DFS over OpNodes
        NM->>NM: propagate producer OutputDesc → consumer InputDesc
        NM->>OS: InferOutputMetadata() → CalculateOutputDType/NDim/Layout
        OS-->>NM: optional<dtype/ndim/layout> stored in OpSpec::outputs_
    end
    NM-->>PL: metadata populated
    PL->>EX: executor_->Build(graph)
    EX->>EX: instantiate operators (DALI_ENFORCE axis validations run here)
    EX-->>PL: built
    PL-->>Py: pipeline ready

    Py->>OP: Setup(output_desc, ws)
    OP->>OP: ValidateInputMetadata(ws, spec)
    OP->>OP: SetupImpl()
    Py->>OP: Run(ws)
    OP->>OP: RunImpl()
    OP->>OP: ValidateOutputMetadata(ws, spec)

_{Reviews (4): Last reviewed commit: "Make default metadata policy opt-in. Mak..." | Re-trigger Greptile}

dali/operators/generic/join.cc

dali/python/nvidia/dali/data_node.py

dali/pipeline/operator/op_schema.cc

dali/python/nvidia/dali/data_node.py

dali/pipeline/operator/operator.cc

dali/pipeline/operator/op_schema.h

jantonguirao

LGTM, except for the bugs raised by greptile and some minor comments

dali/operators/generic/join.cc

dali/python/nvidia/dali/data_node.py

dali/pipeline/operator/op_schema.h

dali/pipeline/operator/op_schema.cc

mzient · 2026-04-07T10:39:31Z

dali/kernels/imgproc/convolution/laplacian_windows.h

+    if (window_size == 0)
+      return {};


Without it a debug build fired an assertion. This is not related to this change but prevented running all the tests in a debug build.

dali/pipeline/operator/op_schema.cc

dali/operators/generic/expand_dims.cc

dali-automaton · 2026-04-07T19:37:18Z

CI MESSAGE: [47951113]: BUILD STARTED

dali-automaton · 2026-04-07T19:51:53Z

CI MESSAGE: [47951113]: BUILD FAILED

dali-automaton · 2026-04-08T06:16:04Z

CI MESSAGE: [47993982]: BUILD STARTED

mdabek-nvidia · 2026-04-08T09:00:17Z

dali/python/nvidia/dali/data_node.py


-    def __init__(self, name, device="cpu", source=None):
+    def __init__(
+        self, name, device="cpu", source=None, index=0, ndim=None, dtype=None, layout=None


Nitpick: missing type hints

We pass some wild stuff there at times, so I wouldn't risk putting type hints here.

dali-automaton · 2026-04-08T10:28:16Z

CI MESSAGE: [47993982]: BUILD FAILED

dali-automaton · 2026-04-08T10:37:50Z

CI MESSAGE: [48010759]: BUILD STARTED

dali-automaton · 2026-04-08T14:31:05Z

CI MESSAGE: [48023970]: BUILD STARTED

dali-automaton · 2026-04-08T14:35:01Z

CI MESSAGE: [48024121]: BUILD STARTED

dali-automaton · 2026-04-08T15:17:17Z

CI MESSAGE: [48010759]: BUILD FAILED

dali-automaton · 2026-04-09T09:16:26Z

CI MESSAGE: [48024121]: BUILD FAILED

dali-automaton · 2026-04-09T15:38:46Z

CI MESSAGE: [48125071]: BUILD STARTED

dali/operators/generic/join.cc

dali/pipeline/operator/op_schema.cc

dali-automaton · 2026-04-09T17:26:05Z

CI MESSAGE: [48133713]: BUILD STARTED

mzient · 2026-04-09T19:45:06Z

@greptileai Please look at expand_dims.cc:71 - out-of-range axes are handled here.

greptile-apps · 2026-04-09T19:45:43Z

You're right - I see the check at line 85 in GenerateSrcDims():

DALI_ENFORCE(d < ndim,
  make_string("Data has not enough dimensions to add new axes at specified indices."));

This correctly validates that axes aren't out of range. When an axis index is too large, d will exceed ndim, triggering this enforcement. Thanks for the clarification!

Tip: You can give me more context with .greptile/rules so I get this right next time.

jantonguirao

Only minor comments left

jantonguirao · 2026-04-10T11:36:41Z

dali/python/nvidia/dali/experimental/dynamic/_invocation.py

+                dtype = self._operator._op_spec.OutputDesc(result_index)[3]
+                if dtype is not None:
+                    return dtype
+            self.run(self._eval_context)


[Minor] DRY — the ndim, dtype, and layout methods now share the same pattern:

if self._results is None: if init_spec := getattr(self._operator, "_init_spec", None): init_spec(self._inputs, self._args) val = self._operator._op_spec.OutputDesc(result_index)[N] if val is not None: return val self.run(self._eval_context)

Consider extracting a helper like _get_static_meta(self, result_index, field_index) to avoid repeating the init_spec / OutputDesc / fallback logic three times.

dali/pipeline/operator/op_schema_test.cc

dali/pipeline/graph/node_meta.cc

dali/pipeline/operator/op_schema.h

dali-automaton · 2026-04-13T18:18:36Z

CI MESSAGE: [48411019]: BUILD FAILED

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

…ALI. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

…y input layout. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton · 2026-04-14T09:08:54Z

CI MESSAGE: [48484956]: BUILD STARTED

dali/pipeline/graph/node_meta.cc

dali/operators/generic/transpose/transpose.cc

jantonguirao · 2026-04-14T14:44:16Z

dali/python/nvidia/dali/experimental/dynamic/_invocation.py

+                layout = self._operator._op_spec.OutputDesc(result_index)[4]
+                if layout is not None:
+                    layout = str(layout)
+                    return None if layout == "" else layout


[Minor] The layout method's fast path returns None for empty layout (return None if layout == "" else layout), but the fallback path returns self._results[result_index].layout() which returns an empty string for no-layout tensors. This means the two code paths return different values for the same semantic state.

Also, the type hint says -> str but the fast path can return None.

jantonguirao · 2026-04-14T14:44:16Z

dali/pipeline/operator/op_schema.cc

+  auto input_layout = input_desc.layout.value_or("");
+
+  if (input_layout.empty()) {
+    // If the layout was empty, we need the number of dimesnions, as "" is legal for any ndim.


[Nit] Typo: dimesnions → dimensions

mzient force-pushed the opschema-metadata branch from b0330dc to 156fd04 Compare April 3, 2026 18:08

greptile-apps bot reviewed Apr 3, 2026

View reviewed changes

dali-automaton assigned mdabek-nvidia, szalpal and jantonguirao Apr 6, 2026

mzient force-pushed the opschema-metadata branch from 644da43 to e23b196 Compare April 7, 2026 09:39

jantonguirao reviewed Apr 7, 2026

View reviewed changes

mzient commented Apr 7, 2026

View reviewed changes

greptile-apps bot reviewed Apr 7, 2026

View reviewed changes

dali/pipeline/operator/op_schema.cc Show resolved Hide resolved

greptile-apps bot reviewed Apr 7, 2026

View reviewed changes

dali/operators/generic/expand_dims.cc Show resolved Hide resolved

mdabek-nvidia approved these changes Apr 8, 2026

View reviewed changes

mzient force-pushed the opschema-metadata branch 2 times, most recently from 5f38835 to 20b9194 Compare April 8, 2026 14:33

jantonguirao reviewed Apr 9, 2026

View reviewed changes

dali/operators/generic/join.cc Outdated Show resolved Hide resolved

dali/pipeline/operator/op_schema.cc Outdated Show resolved Hide resolved

mzient force-pushed the opschema-metadata branch from b4b3b73 to 63b8346 Compare April 9, 2026 17:24

jantonguirao approved these changes Apr 10, 2026

View reviewed changes

rostan-t and others added 25 commits April 14, 2026 11:06

Support specifying output dtype, ndim and layout in OpSchema

8f2c4bc

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>

Infer output ndim, dtype, and layout without full evaluation

f1a6c4f

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>

Start adding output metadata inference to operator schemas

14be70c

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>

Metadata inference working in pipeline mode.

5bbdced

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Revert files with only copyright change.

1d70921

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fixes.

916f8e4

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

More fixes.

9b98fbd

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fixed static inference all operators and their tests. Fix debug mode.

668bbe1

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Rebase cleanup.

2068ca1

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Remove obsolete exception test.

6907f08

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fix review issues.

7736e0b

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Make default metadata policy opt-in. Make it default when compiling D…

454adb8

…ALI. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fix SequenceOperator handling.

8909bf4

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Lint/format.

a5c1fb0

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fix handling of out-of-bounds axes in ExpandDims.

bcbaf9a

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Guard init_spec and init_backend with a lock.

88900d2

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fix stack layout with negative axis. Fix sequence expansion with empt…

ecbd98e

…y input layout. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fix layout handling.

1ac2697

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Fix handling of unknown layouts in sequence ops.

b2f2907

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Review fixes.

0a79683

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Review fixes. Fix output metadata in some more operators.

9ed8917

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Further fixes to layout calculation.

439c97c

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Fix ndim inference in random.choice

bcd99d2

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Indentation.

28e91b2

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Fix opschema tests.

d876708

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient force-pushed the opschema-metadata branch from 3a571e7 to d876708 Compare April 14, 2026 09:06

szalpal approved these changes Apr 14, 2026

View reviewed changes

dali/pipeline/graph/node_meta.cc Show resolved Hide resolved

jantonguirao reviewed Apr 14, 2026

View reviewed changes

Conversation

mzient commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Uh oh!

greptile-apps bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jantonguirao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mzient Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dali-automaton commented Apr 7, 2026

Uh oh!

dali-automaton commented Apr 7, 2026

Uh oh!

dali-automaton commented Apr 8, 2026

Uh oh!

mdabek-nvidia Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

mzient Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

dali-automaton commented Apr 8, 2026

Uh oh!

dali-automaton commented Apr 8, 2026

Uh oh!

dali-automaton commented Apr 8, 2026

Uh oh!

dali-automaton commented Apr 8, 2026

Uh oh!

dali-automaton commented Apr 8, 2026

Uh oh!

dali-automaton commented Apr 9, 2026

Uh oh!

dali-automaton commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

dali-automaton commented Apr 9, 2026

Uh oh!

mzient commented Apr 9, 2026

Uh oh!

greptile-apps bot commented Apr 9, 2026

Uh oh!

jantonguirao left a comment

Choose a reason for hiding this comment

Uh oh!

jantonguirao Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mzient commented Apr 3, 2026 •

edited

Loading

greptile-apps bot commented Apr 3, 2026 •

edited

Loading