Technologyglobal✓ verified · 90%

vLLM: temperature=NaN and temperature=Infinity bypass validation and propagate to GPU kernels

Name: vLLM: temperature=NaN and temperature=Infinity bypass validation and propagate to GPU kernels
Start: 2026-06-17T14:02:22Z
Location: Global (internet)

When: 2026-06-17 14:02 UTC
Where: Global (internet)
Category: cyber_advisory · pip

## Summary All temperature validation gates use comparison operators (`<`, `>`), which silently evaluate to `False` for `NaN` and for positive `Infinity` in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. Note: `-Infinity` is correctly caught. ## Root Cause `sampling_params.py:384`: ```python if 0 < self.temperature < _MAX_TEMP: # NaN → False; +Inf → False ``` `sampling_params.py:462`: ```python if self.temperature < 0.0: # NaN → False; +Inf → False raise VLLMValidationError(...) ``` No `math.isnan()` or `math.isinf()` check exists anywhere in `sampling_params.py`. Python semantics (verified): `float('nan') < 0.0` → `False`, `float('inf') < 0.0` → `False`. ## Impact Crash of inference worker on GPU kernel execution with NaN/Inf softmax input, degrading service for all concurrent users. ## Remediation Add `math.isfinite(self.temperature)` check in `_verify_args()`. Reject non-finite float values with a 400 error. ## Fix A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/45116

Sources

GitHub Advisory Database ↗ · first seen 2026-06-17 14:02 UTC

Defaxon links out to the original reporting and never republishes article text.

Correlated events

Computed by the Defaxon correlation engine — linked by shared actors, co-location, and temporal proximity. Scored hypotheses, never causal claims.

← Back to the live map