Technologyglobal✓ verified · 90%

vLLM: OOM Denial of Service via Audio Decompression Bomb

Name: vLLM: OOM Denial of Service via Audio Decompression Bomb
Start: 2026-06-17T14:06:22Z
Location: Global (internet)

When: 2026-06-17 14:06 UTC
Where: Global (internet)
Category: cyber_advisory · pip

### Summary vLLM's `/v1/audio/transcriptions` endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0. ### Details `SpeechToTextProcessor` rejects uploads over `VLLM_MAX_AUDIO_CLIP_FILESIZE_MB` (default 25MB) based on compressed byte length, but the audio decoder in `audio.py` accumulates all decoded frames into memory with no size limit before returning: ```python # speech_to_text.py L184-189 if len(audio_data) / 1024 ** 2 > self.max_audio_filesize_mb: raise VLLMValidationError(...) y, sr = load_audio(buf, sr=self.asr_config.sample_rate) # decoded size unchecked # audio.py L77-107 chunks: list[npt.NDArray] = [] for frame in container.decode(stream): chunks.append(frame.to_ndarray()) audio = np.concatenate(chunks, axis=-1).astype(np.float32) # single contiguous allocation ``` A 25MB OPUS file at 6kbps encodes ~8.7 hours of audio. Decoding produces ~5.7GB of float32 PCM (232x amplification), and `np.concatenate` then allocates a second contiguous array, bringing peak RSS to ~14.9GB from a single request. `SpeechToTextConfig.max_audio_clip_s` (default 30s) applies only after the full decode and does not prevent the allocation. ### Impact An unauthenticated attacker can exhaust server memory with a small number of concurrent requests, each a valid upload within the documented size limit. Severity was assessed with reference to prior OOM vulnerability reports in vLLM. ### Fix A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/44970

Sources

GitHub Advisory Database ↗ · first seen 2026-06-17 14:06 UTC

Defaxon links out to the original reporting and never republishes article text.

Correlated events

Computed by the Defaxon correlation engine — linked by shared actors, co-location, and temporal proximity. Scored hypotheses, never causal claims.

← Back to the live map