Technologyglobal✓ verified · 90%

vLLM: image EXIF Rotation & PNG tRNS Transparency Not Normalized, Causing Mismatch Between Model Input and Expectations

Name: vLLM: image EXIF Rotation & PNG tRNS Transparency Not Normalized, Causing Mismatch Between Model Input and Expectations
Start: 2026-06-17T14:02:42Z
Location: Global (internet)

When: 2026-06-17 14:02 UTC
Where: Global (internet)
Category: cyber_advisory · pip

## Summary Issue 1: EXIF orientation not normalized → The image orientation processed by the model differs from how humans view it, introducing interpretation bias. Issue 2: PNG tRNS not explicitly flattened before converting to RGB → After conversion, transparent/semi-transparent pixels are rendered unexpectedly, making otherwise subtle overlay elements visible and distorting the input content. (This attack is similar to AlphaDog: RGBA handling is already correct in vLLM, but since tRNS permits RGB images, the correct processing path isn’t taken.) Issue 3 : Pillow only loads the first frame when loading APNG or GIF files. --- ## Root Cause * **Rotation**: After opening an image, `ImageOps.exif_transpose` is not called to normalize EXIF orientation. * **Transparency**: Only **RGBA→RGB** is flattened with a background; PNGs carrying **`tRNS`** in **`P`/`L`/`RGB + tRNS`** and other non-RGBA modes take the `image.convert("RGB")` path, which implicitly discards/remaps transparency semantics. --- ## Affected Code https://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L77-L84 https://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L37-L43 https://github.com/vllm-project/vllm/blob/16b37f3119918c1e5a39f303e0d0892c65c07a90/vllm/multimodal/image.py#L26-L34 > Current state: `ImageOps.exif_transpose` is not used. (Although the `rescale_image_size` function ([https://github.com/vllm-project/vllm/blob/main/vllm/multimodal/image.py#L14](https://github.com/vllm-project/vllm/blob/main/vllm/multimodal/image.py#L14)) exists and includes a `transpose` parameter, I’ve found that it doesn’t seem to be called anywhere outside the `test` directory.） > **Call order**: `_convert_image_mode` runs first; if the conditions are met, `convert_image_mode` is called. > > **Issue**: Only the “RGBA → RGB” path is explicitly flattened. `P`, `L`, or `RGB` with `tRNS` all fall back to `image.convert("RGB")`. For PNGs that include `tRNS`, `convert("RGB")` directly produces 24-bit RGB, leading to: > > * **`P` mode**: The transparent index becomes an actual RGB color (often black, white, or an undefined background), so transparency is lost. > * **`L/LA` and `RGB + tRNS`**: `convert("RGB")` doesn’t composite against a chosen background first, so elements that relied on transparency to be hidden or softened become solid. ## Impact & Scope * **Impact**: Pixels the model sees can diverge from operator expectations (due to orientation or transparency handling), potentially altering downstream reasoning. * **Scope**: The image I/O and mode-conversion paths in `vllm/multimodal/image.py`. The existing **RGBA→RGB** flattening is correct; the issues center on **missing EXIF normalization** and **non-RGBA `tRNS` not being explicitly composited**. ## Case EXIF： http://qiniu.funxingzuo.top/exif_orient_180.jpg tRNS: http://qiniu.funxingzuo.top/hello.png ## Fix A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/44974

Sources

GitHub Advisory Database ↗ · first seen 2026-06-17 14:02 UTC

Defaxon links out to the original reporting and never republishes article text.

Correlated events

Computed by the Defaxon correlation engine — linked by shared actors, co-location, and temporal proximity. Scored hypotheses, never causal claims.

← Back to the live map