09:11 CETWednesday · May 13, 2026

shipfeed

K SEARCHJK NAVO OPEN
on the wire
home/§ local-llm/cluster
ad slot opena single understated line lives here — sponsor wordmark + a short line.advertise on shipfeed →
§ local-llm · cluster

vLLM v0.14.0

Jan 20 · · primary fetch1 sourcecluster 608149b7updated Jan 20 ·

Highlights This release features approximately 660 commits from 251 contributors (86 new contributors). Breaking Changes: Async scheduling is now enabled by default - Users who experience issues can disable with `--no-async-scheduling`. Excludes some not-yet-supported configurations: pipeline parallel, CPU backend, non-MTP/Eagle spec decoding. PyTorch 2.9.1 is now required and the default wheel is compiled against cu129. Deprecated quantization schemes have been removed (#31688, #31285). When using speculative decoding, unsupported sampling parameters will fail rather than being silently ignored (#31982).

Key Improvements: Async scheduling enabled by default (#27614): Overlaps engine core scheduling with GPU execution, improving throughput without user configuration. Now also works with speculative decoding (#31998) and structured outputs (#29821). gRPC server entrypoint (#30190): Alternative to REST API with binary protocol, HTTP/2 multiplexing. `--max-model-len auto` (#29431): Automatically fits context length to available GPU memory, eliminating OOM startup failures. Model inspection view (#29450): View the modules, attention backends, and quantization of your model in vLLM by…

read full article on github.com
§ sources1 publication · timeline below
  1. github.comvllm v0.14.0primary