-
Notifications
You must be signed in to change notification settings - Fork 661
Torchvision API to tensor/PIL image conversion operators #6282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mdabek-nvidia
wants to merge
80
commits into
NVIDIA:main
Choose a base branch
from
mdabek-nvidia:torchvision_totensor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 78 commits
Commits
Show all changes
80 commits
Select commit
Hold shift + click to select a range
e09d04b
Center crop operator
mdabek-nvidia 1b043c0
Review fixes
mdabek-nvidia f30e282
Review fixes
mdabek-nvidia 3337aa9
Apply suggestion from @stiepan
mdabek-nvidia ebbf863
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia d8c5cd4
Review fixes
mdabek-nvidia 4dc37b5
Review fixes
mdabek-nvidia e146f9f
Review fixes
mdabek-nvidia 6a0038d
Review fixes
mdabek-nvidia f53d4ec
Review fixes
mdabek-nvidia dd7ff14
Review fixes
mdabek-nvidia 0eb7bd7
Review fixes and validation renaming
mdabek-nvidia 7fd89d9
Torchvision API - center crop operator (#6266)
mdabek-nvidia 453d18b
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia 2bf954c
Review fixes
mdabek-nvidia 842dbea
Gaussian blur operator
mdabek-nvidia b940dc6
Review fixes
mdabek-nvidia 7c19646
Fix call stack depth handling for error tracebacks in dynamic mode (#…
rostan-t 21c5424
Add uniform_sample option to VideoReaderDecoder (#6258)
jantonguirao 8f48e49
Defer DLTensor deletion when CUDA graph capture is active. (#6259)
JanuszL 5652054
Torchvision API - ColorJitter and Grayscale operators (#6272)
mdabek-nvidia feb2508
Gaussian blur - review fixes
mdabek-nvidia 52c9006
Improve type hinting in functional API
mdabek-nvidia bc192ab
Review fixes
mdabek-nvidia e5fb444
Review fixes
mdabek-nvidia 68cfc26
Gaussian blur operator
mdabek-nvidia 0cd9a05
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia 6413943
Review fixes
mdabek-nvidia 65bda1b
Gaussian blur operator
mdabek-nvidia 68f2b3f
Torchvision Pad operators
mdabek-nvidia 22daf70
Review fixes
mdabek-nvidia 39895d8
Fixing annotations in functional API
mdabek-nvidia 2f452ae
Review fixes
mdabek-nvidia fa9360d
Merge branch 'main' into torchvision_pad
mdabek-nvidia ce6f14b
Review fixes
mdabek-nvidia e17dcc9
Review fixes
mdabek-nvidia 66208e4
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia baa817f
Review fixes
mdabek-nvidia 8946174
Gaussian blur operator
mdabek-nvidia 996c38a
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia cd76b76
Review fixes
mdabek-nvidia 96aa07d
Gaussian blur operator
mdabek-nvidia 89037e2
Torchvision Pad operators
mdabek-nvidia 066f8f0
Torchvision normalize operators implementation
mdabek-nvidia 159af28
Review fixes
mdabek-nvidia e86e551
Review fixes
mdabek-nvidia ca8bf9e
Improving type hints
mdabek-nvidia d08ca9b
Correct std and mean for functional API
mdabek-nvidia cb5848f
Merge branch 'main' into torchvision_normalize
mdabek-nvidia ed8d250
Typo fix
mdabek-nvidia 45e4b5c
Review fixes
mdabek-nvidia f56cf44
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia f32200e
Review fixes
mdabek-nvidia 70e66eb
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia 724b2f9
Review fixes
mdabek-nvidia 48e6ab6
Gaussian blur operator
mdabek-nvidia 1dfa682
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia ea1dd01
Review fixes
mdabek-nvidia a32e31c
Gaussian blur operator
mdabek-nvidia fa48ebb
Torchvision normalize operators implementation
mdabek-nvidia 122c676
Torchvision - user documentation
mdabek-nvidia 3050c07
Post rebase fixes
mdabek-nvidia 27b515b
Post rebase update
mdabek-nvidia 593489c
Torchvision API documentation in Getting started
mdabek-nvidia 0bbac6a
Rebase fixes
mdabek-nvidia 6f21c9d
Review fixes
mdabek-nvidia db22042
Removed WAR for ndd.Batch creation
mdabek-nvidia 4fb48b4
Moving Torchvision API test to L1
mdabek-nvidia f076eb9
DLPack capsule fix for PyTorch 2.7.1
mdabek-nvidia b9371c7
Merge branch 'main' into torchvision_documentation
mdabek-nvidia cc44d31
Revert "DLPack capsule fix for PyTorch 2.7.1"
mdabek-nvidia 6566746
Review fixes
mdabek-nvidia bc7ef81
Torchvision ColorJitter and Grayscale implementations
mdabek-nvidia 66613ab
Torchvision implementation of tensor and PIL conversions
mdabek-nvidia 34f32de
Review fixes
mdabek-nvidia 9790262
Review fixes
mdabek-nvidia c0c260f
Merge branch 'main' into torchvision_totensor
mdabek-nvidia cdc9d3e
Refactor of PipelineWithLayout to use PIL conversion functions
mdabek-nvidia a807873
Generalized tensor conversion
mdabek-nvidia 4315106
Review fixes
mdabek-nvidia File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
116 changes: 116 additions & 0 deletions
116
dali/python/nvidia/dali/experimental/torchvision/v2/functional/totensor.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import numpy as np | ||
| from PIL import Image | ||
| import torch | ||
|
|
||
|
|
||
| def pil_to_tensor(inpt: Image.Image | np.ndarray) -> torch.Tensor: | ||
mdabek-nvidia marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
mdabek-nvidia marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| """ | ||
| Convert a ``PIL.Image`` to a uint8 CHW ``torch.Tensor``. | ||
mdabek-nvidia marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Values are in [0, 255] and the dtype is ``torch.uint8``. No scaling is applied. | ||
| Mirrors ``torchvision.transforms.v2.functional.pil_to_tensor``. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| inpt : PIL.Image | ||
| Input image. Modes ``L``, ``RGB``, and ``RGBA`` are supported. | ||
|
|
||
| Returns | ||
| ------- | ||
| torch.Tensor | ||
| CHW tensor of dtype matching PIL mode (uint8 for L/RGB/RGBA, int32 for I, float32 for F). | ||
| """ | ||
| if not isinstance(inpt, (Image.Image, np.ndarray)): | ||
| raise TypeError(f"Expected PIL.Image or numpy array, got {type(inpt)}") | ||
|
|
||
| if isinstance(inpt, Image.Image): | ||
| arr = np.array(inpt, copy=True) # (H, W) for L, (H, W, C) for RGB/RGBA | ||
mdabek-nvidia marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| else: | ||
| # Note: numpy array is used directly without copying, this matches Torchvision | ||
| arr = inpt | ||
mdabek-nvidia marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| if arr.ndim == 2: | ||
| arr = np.expand_dims(arr, axis=-1) # (H, W) → (H, W, 1) | ||
|
|
||
| return torch.from_numpy(arr).permute(2, 0, 1) # (H, W, C) → (C, H, W) | ||
|
|
||
|
|
||
| def to_tensor(inpt: Image.Image | np.ndarray) -> torch.Tensor: | ||
| """ | ||
| Convert a ``PIL.Image`` to a float32 CHW ``torch.Tensor`` with values in [0, 1]. | ||
|
|
||
| Mirrors ``torchvision.transforms.v2.functional.to_tensor`` (deprecated in TV v2, | ||
| but kept here for compatibility). | ||
|
|
||
| Parameters | ||
| ---------- | ||
| inpt : PIL.Image | ||
| Input image. Modes ``L``, ``RGB``, and ``RGBA`` are supported. | ||
|
|
||
| Returns | ||
| ------- | ||
| torch.Tensor | ||
| CHW tensor of dtype ``torch.float32`` with values in [0.0, 1.0]. | ||
| """ | ||
| return pil_to_tensor(inpt).float() / 255.0 | ||
|
|
||
|
|
||
| def to_pil_image(inpt: torch.Tensor, mode: str | None = None) -> Image.Image: | ||
mdabek-nvidia marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| """ | ||
| Convert a CHW ``torch.Tensor`` to a ``PIL.Image``. | ||
|
|
||
| Mirrors ``torchvision.transforms.v2.functional.to_pil_image``. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| inpt : torch.Tensor | ||
| CHW tensor. Supported channel counts: 1 (``L``), 3 (``RGB``), 4 (``RGBA``). | ||
| mode : str or None, optional | ||
| PIL image mode. If ``None`` the mode is inferred from the channel count. | ||
|
|
||
| Returns | ||
| ------- | ||
| PIL.Image | ||
| """ | ||
| if not isinstance(inpt, torch.Tensor): | ||
| raise TypeError(f"Expected torch.Tensor, got {type(inpt)}") | ||
| if inpt.ndim != 3: | ||
| raise ValueError(f"Expected 3-D CHW tensor, got shape {tuple(inpt.shape)}") | ||
|
|
||
| hwc = inpt.permute(1, 2, 0).cpu() # (C, H, W) → (H, W, C) | ||
| channels = hwc.shape[-1] | ||
|
|
||
| if mode is None: | ||
| if channels == 1: | ||
| mode = "L" | ||
| elif channels == 3: | ||
| mode = "RGB" | ||
| elif channels == 4: | ||
| mode = "RGBA" | ||
| else: | ||
| raise ValueError( | ||
| f"Cannot infer PIL mode from {channels} channels. " "Pass mode explicitly." | ||
| ) | ||
|
|
||
| arr = hwc.numpy() | ||
| if np.issubdtype(arr.dtype, np.floating) and mode != "F": | ||
| arr = (arr * 255).astype(np.uint8) | ||
mdabek-nvidia marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| if mode == "L": | ||
| arr = arr.squeeze(-1) | ||
|
|
||
| return Image.fromarray(arr, mode=mode) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.