Skip to content

Video Encode/Decode#226

Open
Daedie-git wants to merge 79 commits into
NVIDIA-RTX:mainfrom
Daedie-git:video-queues
Open

Video Encode/Decode#226
Daedie-git wants to merge 79 commits into
NVIDIA-RTX:mainfrom
Daedie-git:video-queues

Conversation

@Daedie-git
Copy link
Copy Markdown

The open encode/decode enticed me to try and get VK hardware decoding working in my software (until now, I only had it on the D3D12 side). I expect this will need further changes, just let me know.
I wasn't sure what the right way to handle the D3D12 CommandBuffer was, I settled on the variant for now.

Summary

Adds hardware video queue support and a minimal native-backed video encode/decode extension API.

This branch makes video-capable queues visible through NRI, allows D3D12/Vulkan video command buffers to be created and submitted, adds
basic multi-planar video format support, and exposes NRIVideo entry points for issuing native encode/decode commands.

Changes

Public API

  • Adds QueueType::VIDEO_DECODE
  • Adds QueueType::VIDEO_ENCODE
  • Adds multi-planar 4:2:0 formats:
    • G8_B8R8_2PLANE_420_UNORM
    • G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16
    • G16_B16R16_2PLANE_420_UNORM
  • Adds explicit image plane bits for multi-planar images:
    • PLANE_0
    • PLANE_1
    • PLANE_2
  • Adds Include/Extensions/NRIVideo.h
    • VideoInterface
    • CmdDecodeVideo
    • CmdEncodeVideo
    • native D3D12/Vulkan encode/decode descriptor structs

Adapter and Queue Discovery

  • Reports D3D12 video decode/encode queue availability when ID3D12VideoDevice is available
  • Adds Vulkan queue family selection support for video queues
  • Uses VkQueueFamilyVideoPropertiesKHR to ensure Vulkan video queues also expose compatible codec operations
  • Reuses queue family create infos when multiple NRI queue types map to the same Vulkan queue family
  • Tracks queue storage for the new video queue types in D3D12 and Vulkan devices

D3D12 Backend

  • Maps NRI video queues to:
    • D3D12_COMMAND_LIST_TYPE_VIDEO_DECODE
    • D3D12_COMMAND_LIST_TYPE_VIDEO_ENCODE
  • Refactors CommandBufferD3D12 to hold graphics, video decode, or video encode command lists
  • Supports begin/end/reset/close for D3D12 video command lists
  • Submits video command lists through the existing queue submission path
  • Adds CmdDecodeVideo via ID3D12VideoDecodeCommandList::DecodeFrame
  • Adds CmdEncodeVideo via ID3D12VideoEncodeCommandList2::EncodeFrame
  • Updates native command buffer documentation to return ID3D12CommandList*, since video command lists are not graphics command lists

Vulkan Backend

  • Adds video-related desired device extensions, including:
    • VK_KHR_video_queue
    • VK_KHR_video_decode_queue
    • H.264/H.265/AV1 decode extensions
    • VK_KHR_video_encode_queue
    • video maintenance extensions
    • sampler YCbCr conversion
  • Adds Vulkan format mappings for the new multi-planar formats
  • Adds Vulkan plane aspect mapping for explicit plane views
  • Allows texture views to select explicit multi-planar aspects
  • Resolves vkCmdDecodeVideoKHR and vkCmdEncodeVideoKHR
  • Adds CmdDecodeVideo / CmdEncodeVideo forwarding to Vulkan command buffers

Validation / Interface Wiring

  • Wires VideoInterface through nriGetInterface
  • Adds DeviceBase::FillFunctionTable(VideoInterface&)
  • Adds backend FillFunctionTable(VideoInterface&) implementations
  • Adds validation-layer storage and forwarding for the video interface
  • Reports unsupported when the backend/device does not expose video decode or encode queues

API Shape

The new video API is intentionally small and native-backed.

NRI handles queue discovery, command buffer ownership, function table exposure, and command dispatch. Codec/session setup remains
backend-native (initially):

  • D3D12 callers pass native decoder/encoder objects and D3D12 argument structs
  • Vulkan callers pass native VkVideoDecodeInfoKHR / VkVideoEncodeInfoKHR payloads

Validation

Built successfully with:

  • Vulkan/validation-focused configuration
  • D3D12-focused configuration
  • small encode -> decode unit test in my own repo

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Wow! Magnificent! Thanks a ton for bringing this work to life! I will try to review later today.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Question 1

Should VideoDecodeD3D12Desc, VideoEncodeD3D12Desc, VideoDecodeVKDesc and VideoEncodeVKDesc be moved into corresponding wrappers and the underlying objects created via NRI? (not a pro in video APIs yet)

Question 2

Should we add more formats? We should invent better names :) DXGI has short names, but they are unclear (seems to be a good start):

    G8_B8R8_2PLANE_420_UNORM,
    G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16,
    G16_B16R16_2PLANE_420_UNORM,

Question 3

No D3D11 support? Probably not needed, but, IM, nice to have :)

(...to be continued)

@Daedie-git
Copy link
Copy Markdown
Author

Q1
I'll get back to you on this.

Q2:
How about this

NV12_UNORM
P010_UNORM
P016_UNORM

Q3
D3D11 video semantics work very differently. So regardless if it is added or not, I would do that in a separate PR.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Q2

+1, clarifying comments may be added if needed. They are pretty standard.

Q3

No objections here (assuming the interface is implementable in D3D11)

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Do we need other formats, like DXGI_FORMAT_Y410 (10:10:10)?

@dzhdanNV
Copy link
Copy Markdown
Collaborator

Please, add this link somewhere in the beginning of NRIVideo.h: https://learn.microsoft.com/en-us/windows/win32/medfound/recommended-8-bit-yuv-formats-for-video-rendering

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented Apr 28, 2026

I'm ready to merge it after you are done with polishing. Just let me know. Further improvements can be made in main. Can we create a minimalistic sample in https://github.com/NVIDIA-RTX/NRISamples for this?

I haven't deeply analyzed the implementation, but the API additions are nice and clear.

Also I think that these formats:

    // Depth-stencil (as a shader resource view)
    R24_UNORM_X8,       // .x - depth   // + . . . . . . . . . . . . . . .
    X24_G8_UINT,        // .y - stencil // + . . . . . . . . . . . . . . .
    R32_SFLOAT_X8_X24,  // .x - depth   // + . . . . . . . . . . . . . . .
    X32_G8_UINT_X24     // .y - stencil // + . . . . . . . . . . . . . . .

can be removed and readonlyPlanes can be used instead explicitly. readonlyPlanes are needed for RTV and actually for SRV (you already use it for video). It's a bit of a breaking change, but, IMO, a step towards better API. What do you think? Just asking for your opinion, I will implement it myself.

@Daedie-git
Copy link
Copy Markdown
Author

I'm ready to merge it after you are done with polishing. Just let me know. Further improvements can be made in main. Can we create a minimalistic sample in https://github.com/NVIDIA-RTX/NRISamples for this?

Definitely. I'll look into it as soon as I can.

I haven't deeply analyzed the implementation, but the API additions are nice and clear.

Also I think that these formats:

    // Depth-stencil (as a shader resource view)
    R24_UNORM_X8,       // .x - depth   // + . . . . . . . . . . . . . . .
    X24_G8_UINT,        // .y - stencil // + . . . . . . . . . . . . . . .
    R32_SFLOAT_X8_X24,  // .x - depth   // + . . . . . . . . . . . . . . .
    X32_G8_UINT_X24     // .y - stencil // + . . . . . . . . . . . . . . .

can be removed and readonlyPlanes can be used instead explicitly. readonlyPlanes are needed for RTV and actually for SRV (you already use it for video). It's a bit of a breaking change, but, IMO, a step towards better API. What do you think? Just asking for your opinion, I will implement it myself.

I like that.

@Daedie-git
Copy link
Copy Markdown
Author

Do we need other formats, like DXGI_FORMAT_Y410 (10:10:10)?

The current set targets the common 4:2:0 format family needed by the decode/encode path: NV12, P010, P016. Y410 is packed 4:4:4 and would add support for it in a follow-up if the need arises.

@vertver
Copy link
Copy Markdown
Contributor

vertver commented Apr 29, 2026

I don't like that you're using native API structures in NRI public API. Wouldn't be logical to implemented it as a translation layer, not in the NRIWrapper* way?

dzhdanNV added a commit that referenced this pull request Apr 30, 2026
- "TextureViewDesc::readonlyPlanes" renamed to "planes". Logic switched to "direct" from "inverse"
- removed special depth-stencil format for shader resource views ("R24_UNORM_X8", "X24_G8_UINT", "R32_SFLOAT_X8_X24" and "X32_G8_UINT_X24"). Use "planes" instead
- explained valid usage

OTHER:
- VK: untangled "planes" (aspect mask) logic to be friendly for subpass inputs
- prerequisite for video support (PR #226)
@dzhdanNV
Copy link
Copy Markdown
Collaborator

I like that.

@Daedie-git I have implemented it. Please, update your code to the latest since it's a prerequisite for the video support. Please, note that I have removed PR #222 since the code is completely different for these lines. Feel free to adjust, re-add or discuss. Thanks in advance! We are moving forward!

I don't like that you're using native API structures in NRI public API. Wouldn't be logical to implemented it as a translation layer, not in the NRIWrapper* way?

Just echoing already said, I absolutely agree with this. GAPI specific structs must be moved to corresponding NRIWrapperX extension and NRIVideo must get an abstraction. This is what we should focus on.

@Daedie-git
Copy link
Copy Markdown
Author

I like that.

@Daedie-git I have implemented it. Please, update your code to the latest since it's a prerequisite for the video support. Please, note that I have removed PR #222 since the code is completely different for these lines. Feel free to adjust, re-add or discuss. Thanks in advance! We are moving forward!

I don't like that you're using native API structures in NRI public API. Wouldn't be logical to implemented it as a translation layer, not in the NRIWrapper* way?

Just echoing already said, I absolutely agree with this. GAPI specific structs must be moved to corresponding NRIWrapperX extension and NRIVideo must get an abstraction. This is what we should focus on.

Indeed. I have addressed this locally already. I'll rebase it on your change and push it soon.

@Daedie-git
Copy link
Copy Markdown
Author

I wrapped up my changes.

@Daedie-git
Copy link
Copy Markdown
Author

I found some issues, so I will follow up with another commit.

@Daedie-git
Copy link
Copy Markdown
Author

Additional question: For the sake of testing this, I've been building up quite the test suite in my own repo. Do you want to have it in some form?
Apart from the samples you already asked for.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented Jun 2, 2026

May I continue "massaging" the code or better wait for fixes from your side, if there is something in the works?

@Daedie-git
Copy link
Copy Markdown
Author

May I continue "massaging" the code or better wait for fixes from your side, if there is something in the works?

Feel free.
Have been very busy with my sons birthday the last weeks. I hope to be able to pick up again this week. But as soon as I do, I will leave a message here to let you know, so you can edit freely in the meantime.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented Jun 4, 2026

Do we really need Cmd[Decode/Encode]Video[D3D12/VK] in wrapper interfaces? I think we need only Create functions for new entities (except VideoPicture, I think). The rest can be done using the video interface. Do you agree?

@Daedie-git
Copy link
Copy Markdown
Author

Do we really need Cmd[Decode/Encode]Video[D3D12/VK] in wrapper interfaces? I think we need only Create functions for new entities (except VideoPicture, I think). The rest can be done using the video interface. Do you agree?

Not strictly necessary.
The primary path is VideoInterface::CmdDecodeVideo / CmdEncodeVideo, with NRI-owned VideoSession, VideoSessionParameters, VideoPicture, buffers, etc.

VideoPicture is the exception: it adapts an NRI texture/subresource into backend-specific video picture state, so keeping CreateVideoPicture in the video interface makes sense I think.

The main reason of having them was to have an native passthrough option for things that weren't modelled yet in the abstraction. This would also be the main argument for keeping it: currently unsupported features/options.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented Jun 4, 2026

Please, allow me to do this:

  • remove Cmd[Decode/Encode]VideoD3D12 with all dependencies from WrapperD3D12
  • remove Cmd[Decode/Encode]VideoVK with all dependencies from WrapperVK
    • both are outliers in terms of wrapper functionality
    • GetNativeX can be used to reach the same functionality
  • add CreateVideoSessionD3D12 and CreateVideoSessionVK
    • but D3D12 requires optimization of ComPtrs (only session and heap are needed, session maybe encoder or decoder)
    • but for VK I need to understand all the parameters in the struct and what needs to be actually wrapped
    • CreateVideoPicture stays untouched in NRIVideo

What do you think?

@Daedie-git
Copy link
Copy Markdown
Author

Please, allow me to do this:

  • remove Cmd[Decode/Encode]VideoD3D12 with all dependencies from WrapperD3D12

  • remove Cmd[Decode/Encode]VideoVK with all dependencies from WrapperVK

    • both are outliers in terms of wrapper functionality
    • GetNativeX can be used to reach the same functionality
  • add CreateVideoSessionD3D12 and CreateVideoSessionVK

    • but D3D12 requires optimization of ComPtrs (only session and heap are needed, session maybe encoder or decoder)
    • but for VK I need to understand all the parameters in the struct and what needs to be actually wrapped
    • CreateVideoPicture stays untouched in NRIVideo

What do you think?

Sounds good to me.

dzhdanNV added 7 commits June 5, 2026 10:04
…K_SUPPORT" (it's all or nothing, NRI requires latest known AgilitySDK if it's enabled)
- TODO: allow wrapping of video command lists
- TODO: AV1 is incomplete, "1" interface usage is incomplete
- 4 ComPtrs reduced to 2
- added TODOs since the usage is still incomplete
- honored "private"
@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented Jun 5, 2026

I spent many days working on video support. Notes:

  • AV1 support looks incomplete
  • enhanced barriers are missed for hidden barriers
  • Agility or non-Agility code path involving manipulations with "numbered interfaces" is missing
  • search for TODO-VIDEO

It looks like we need an "AI skill" to properly define:

  • coding standards
  • AgilitySDK usage idioms
  • best practices
  • what to do and what to avoid

Yeah, it sounds vague but it's a way to go. I have already made tons of changes :( In any case the AI assistant must review the recent.

@Daedie-git
Copy link
Copy Markdown
Author

I spent many days working on video support. Notes:

  • AV1 support looks incomplete
  • enhanced barriers are missed for hidden barriers
  • Agility or non-Agility code path involving manipulations with "numbered interfaces" is missing
  • search for TODO-VIDEO

It looks like we need an "AI skill" to properly define:

  • coding standards
  • AgilitySDK usage idioms
  • best practices
  • what to do and what to avoid

Yeah, it sounds vague but it's a way to go. I have already made tons of changes :( In any case the AI assistant must review the recent.

AV is indeed incomplete. Somewhat deliberately. Because the PR has already scope crept quite a bit beyond what I initially intended to do. Which, to be clear, is not a problem for me. But I felt compelled to not drag this on indefinitely, which has left things in a bit of a "in between" state. In hindsight I probably should've pulled the original PR and re-open it in a more complete state.

That being said. My schedule will be freeing up quite a bit soon, and I can pick up where I left off and iterate this towards a sufficiently complete state.

To reduce AI agent entropy, my main recommendations are:

  • AGENTS.md file in root. Should be brief, but contain non-negotiables, so those are always "visible". These things should go in it basically:
    • coding standards
    • AgilitySDK usage idioms
    • best practices
    • what to do and what to avoid
  • a review skill, specifically tailored to the repo. This is probably the most useful. At least for how I use it.
  • (Temporary) artifacts of what needs to be addressed. To anchor between sessions.

Sorry if I have left you with a frustrating experience 🙂

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented Jun 5, 2026

I don't use AI coding assistants yet. But I'm planning to setup Claude and after getting familiar with it, yeah, AI-related md files must appear in one form or another.

@dzhdanNV
Copy link
Copy Markdown
Collaborator

dzhdanNV commented Jun 5, 2026

#if NRI_ENABLE_AGILITY_SDK_SUPPORT
    commandList->EncodeFrame1(session.GetEncoder(), session.GetEncoderHeap(), &input, &output);
#else
    commandList->EncodeFrame(session.GetEncoder(), session.GetEncoderHeap(), &input, &output); // FAILS to compile
#endif

...

#if NRI_ENABLE_AGILITY_SDK_SUPPORT
        commandList->ResolveEncoderOutputMetadata1(&resolveInput, &resolveOutput);
#else
        commandList->ResolveEncoderOutputMetadata(&resolveInput, &resolveOutput); // FAILS to compile
#endif

As it turned out, real video encoding (i.e. EncodeFrame API) requires at least ID3D12VideoEncodeCommandList2 interface, which is a part of AgilitySDK. I.e. NRI's baseline, which is D3D12 Ultimate (pre Agility), doesn't support encoding at all. Should I wrap all "encoding" related functions into #if NRI_ENABLE_AGILITY_SDK_SUPPORT at the beginning and #endif at the end only once? I.e. previously there was an attempt to separate AV1 support, but NRI may be compiled into in 2 ways:

  • or using latest (known) Agility, which is 1.619.3 now
  • or using D3D12 Ultimate pre-Agility (Windows SDK 10.0.20348)

so, it's "all or nothing".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants