Fix `SmolVLM` video processor `resize` using wrong interpolation after backend refactor by ydshieh · Pull Request #45258 · huggingface/transformers

ydshieh · 2026-04-06T05:09:09Z

Summary

PR #43514 refactored _preprocess to pass resample=resample to resize, but the resize method in SmolVLMVideoProcessor still had interpolation as its parameter name. The resample kwarg was silently swallowed by **kwargs, causing interpolation to always default to BILINEAR instead of the intended LANCZOS→BICUBIC path.

Without this fix, SmolVLMForConditionalGenerationIntegrationTest::test_integration_test_video has inputs["pixel_values"] with a larger difference (~0.36) before and after #43514, which also changes the model output values.

Fix

Rename the interpolation parameter to resample in SmolVLMVideoProcessor.resize
Convert PIL resample integers to torchvision InterpolationMode via pil_torch_interpolation_mapping, matching the pattern used in TorchvisionBackend.resize in image_processing_backends.py

Before #43514 (correct path)

resize(interpolation=InterpolationMode.LANCZOS) → interpolation == tvF.InterpolationMode.LANCZOS → BICUBIC fallback

After #43514 without this fix (broken path)

resize(resample=1) → resample swallowed by **kwargs, interpolation=None → defaults to BILINEAR

After #43514 with this fix (correct path restored)

resize(resample=1) → pil_torch_interpolation_mapping[1] → InterpolationMode.LANCZOS → BICUBIC fallback

…age processor backend refactor The PR #43514 refactored _preprocess to pass resample=resample to resize, but resize still accepted interpolation as its parameter. The resample kwarg was silently swallowed by **kwargs, causing interpolation to default to BILINEAR instead of the intended LANCZOS->BICUBIC path, producing ~0.36 difference in pixel_values. Fix by renaming the parameter to resample and converting PIL resample integers to torchvision InterpolationMode via pil_torch_interpolation_mapping, matching the pattern used in TorchvisionBackend.resize.

github-actions · 2026-04-06T05:10:11Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: smolvlm

HuggingFaceDocBuilderDev · 2026-04-06T05:18:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2026-04-06T05:22:49Z

run-slow: smolvlm

github-actions · 2026-04-06T05:24:01Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/smolvlm"]
quantizations: []

ydshieh · 2026-04-06T05:30:10Z

tests/models/smolvlm/test_modeling_smolvlm.py

                (None, None): 'User: You are provided the following series of nine frames from a 0:00:09 [H:MM:SS] video.\n\nFrame from 00:00:\nFrame from 00:01:\nFrame from 00:02:\nFrame from 00:03:\nFrame from 00:04:\nFrame from 00:05:\nFrame from 00:06:\nFrame from 00:08:\nFrame from 00:09:\n\nDescribe this video in detail\nAssistant: The video depicts a large language model architecture, specifically a language model with a "quick brown" feature',
                ("cuda", (8, 0)): 'User: You are provided the following series of nine frames from a 0:00:09 [H:MM:SS] video.\n\nFrame from 00:00:\nFrame from 00:01:\nFrame from 00:02:\nFrame from 00:03:\nFrame from 00:04:\nFrame from 00:05:\nFrame from 00:06:\nFrame from 00:08:\nFrame from 00:09:\n\nDescribe this video in detail\nAssistant: The video showcases a large language model architecture, specifically a "Quick Brown" model, which is designed',
-                ("cuda", (8, 6)): 'User: You are provided the following series of nine frames from a 0:00:09 [H:MM:SS] video.\n\nFrame from 00:00:\nFrame from 00:01:\nFrame from 00:02:\nFrame from 00:03:\nFrame from 00:04:\nFrame from 00:05:\nFrame from 00:06:\nFrame from 00:08:\nFrame from 00:09:\n\nDescribe this video in detail\nAssistant: The video showcases a large language model, specifically a neural network model, which is designed to learn and',
+                ("cuda", (8, 6)): 'User: You are provided the following series of nine frames from a 0:00:09 [H:MM:SS] video.\n\nFrame from 00:00:\nFrame from 00:01:\nFrame from 00:02:\nFrame from 00:03:\nFrame from 00:04:\nFrame from 00:05:\nFrame from 00:06:\nFrame from 00:08:\nFrame from 00:09:\n\nDescribe this video in detail\nAssistant: The video depicts a large language model architecture, specifically a language model with a "quick brown" feature',


This value should have been updated long time ago. The PR #43514 further changed the actual outputs, but with the fix of this PR, it brings the actual output back to the one that remain the same for several months, which is the new value I provide here.

github-actions · 2026-04-06T05:34:31Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	a1cab4b5	workflow commit (merge commit)
PR	908e786e	branch commit (from PR)
main	374d44d5	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

github-actions · 2026-04-06T05:34:50Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45258&sha=908e78

yonigozlan

Thanks for catching this @ydshieh ! I'll check if this is present in any other places. We can simplify this a bit as explained below.

yonigozlan · 2026-04-06T15:11:29Z

src/transformers/models/smolvlm/video_processing_smolvlm.py

+        if resample is not None:
+            if isinstance(resample, (PILImageResampling, int)):
+                interpolation = pil_torch_interpolation_mapping[resample]
+            else:
+                interpolation = resample
+        else:
+            interpolation = tvF.InterpolationMode.BILINEAR
        if interpolation == tvF.InterpolationMode.LANCZOS:
            logger.warning_once(
                "You have used fast image processor with LANCZOS resample which not yet supported for torch.Tensor. "
                "BICUBIC resample will be used as an alternative. Please fall back to image processor if you "
                "want full consistency with the original model."
            )
            interpolation = tvF.InterpolationMode.BICUBIC


As long as we use super().resize with resample arg where tvf.resize is used below, we shouldn't need this logic (it's already in TorchvisionBackend)

OK, thanks! It seems not the case for now. I can add a comment like "TODO (yoni): try to use super().resize".

I will wait a bit to see if you are ok with such a comment, then merge.

I'll just make the modifications in this PR if that's ok

I really prefer to have a clean CI that everyone could trust the most and work on top of it. Having failing failing tests more time means if there are other PRs introducing new breaks (of different types) on the already failing tests, we won't detect it. Accumulation of such make the debug and the fix more difficult and time consuming.

(You can see several examples like #45268 or #45252)

(And I prefer to let you to find the time to work on this - you are much better than me. And I don't want to ask you to do it like today or this week)

Going to merge! Hope my arguments above convince you 🙏

No problem! I just opened a PR for this if you could approve :) #45272

…r backend refactor (huggingface#45258) * Fix SmolVLM video processor resize using wrong interpolation after image processor backend refactor The PR huggingface#43514 refactored _preprocess to pass resample=resample to resize, but resize still accepted interpolation as its parameter. The resample kwarg was silently swallowed by **kwargs, causing interpolation to default to BILINEAR instead of the intended LANCZOS->BICUBIC path, producing ~0.36 difference in pixel_values. Fix by renaming the parameter to resample and converting PIL resample integers to torchvision InterpolationMode via pil_torch_interpolation_mapping, matching the pattern used in TorchvisionBackend.resize. * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix

908e786

ydshieh requested a review from yonigozlan April 6, 2026 05:22

ydshieh mentioned this pull request Apr 6, 2026

Fix more integration tests for important models #45254

Open

ydshieh changed the title ~~Fix SmolVLM video processor resize using wrong interpolation after backend refactor~~ Fix SmolVLM video processor resize using wrong interpolation after backend refactor Apr 6, 2026

ydshieh commented Apr 6, 2026

View reviewed changes

yonigozlan approved these changes Apr 6, 2026

View reviewed changes

ydshieh merged commit 182f20c into main Apr 6, 2026
23 of 25 checks passed

ydshieh deleted the fix_smolvlm branch April 6, 2026 19:41

Conversation

ydshieh commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Before #43514 (correct path)

After #43514 without this fix (broken path)

After #43514 with this fix (correct path restored)

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 6, 2026

Uh oh!

ydshieh commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

ydshieh Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 6, 2026

CI Results

Commit Info

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

yonigozlan Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ydshieh Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

yonigozlan Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ydshieh Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ydshieh Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

yonigozlan Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ydshieh commented Apr 6, 2026 •

edited

Loading