08:25 CETWednesday · May 13, 2026

shipfeed

⌘K SEARCHJK NAVO OPEN

on the wire

08:00:18arXiv — cs.AIMedHopQA benchmark tests LLM reasoning in biomedical Q&A◆08:00:18arXiv — cs.CLRouters learn geometry of sparse mixture-of-experts◆08:00:18arXiv — cs.AIStudy audits how LLMs generate political discourse during crises◆08:00:18arXiv — cs.AIClassifier Context Rot: Monitor Performance Degrades with Context◆08:00:18arXiv — cs.AIExecutable Agentic Memory for GUI Agent◆08:00:18arXiv — cs.CLLongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced◆08:00:18arXiv — cs.AISparse-to-dense rewards improve language model post-training◆08:00:18arXiv — cs.AIAI-native mobility dataset advances 6G handover and beam management◆08:00:18arXiv — cs.AIMedHopQA benchmark tests LLM reasoning in biomedical Q&A◆08:00:18arXiv — cs.CLRouters learn geometry of sparse mixture-of-experts◆08:00:18arXiv — cs.AIStudy audits how LLMs generate political discourse during crises◆08:00:18arXiv — cs.AIClassifier Context Rot: Monitor Performance Degrades with Context◆08:00:18arXiv — cs.AIExecutable Agentic Memory for GUI Agent◆08:00:18arXiv — cs.CLLongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced◆08:00:18arXiv — cs.AISparse-to-dense rewards improve language model post-training◆08:00:18arXiv — cs.AIAI-native mobility dataset advances 6G handover and beam management◆

home/topics/local-llm

§ topic · local-llm

local-llm

18 this week·34 this month·76 all-time

Open-weight model releases and local inference tooling

ad slot opena single understated line lives here — sponsor wordmark + a short line.advertise on shipfeed →

clusters this week30 active

N° 001·▶ ai·07:44:39

not much happened today

Gemma 4 was launched by Google under an Apache 2.0 license, marking a significant open-model release focused on reasoning, agentic workflows, multimodality, and on-device use. It outperforms models 10x larger and has…

via news.smol.ai

Wednesday, March 11, 2026’s editionWednesday, March 11, 2026

N° 001·▶ ai·06:44:39

not much happened today

NVIDIA’s Nemotron 3 Super is a 120B parameter / ~12B active open model featuring a hybrid Mamba-Transformer / SSM Latent MoE architecture and 1M context window, delivering up to 2.2x faster inference than GPT-OSS-120B…

via news.smol.ai

Tuesday, May 5, 2026’s editionTuesday, May 5, 2026

N° 001·▶ ai·18:52:21

Transformers v5.8.0

Release v5.8.0 New Model additions DeepSeek-V4 DeepSeek-V4 is the next-generation MoE (Mixture of Experts) language model from DeepSeek that introduces several architectural innovations over DeepSeek-V3. The…

Monday, April 27, 2026’s editionMonday, April 27, 2026

N° 001·▶ ai·23:20:28

vLLM v0.20.0

vLLM v0.20.0 Highlights This release features 752 commits from 320 contributors (123 new)! DeepSeek V4: Initial DeepSeek V4 support landed (#40860), with DSML token-leakage fix in DSV4/3.2 (#40806), DSA + MTP IMA fix…

Wednesday, April 22, 2026’s editionWednesday, April 22, 2026

N° 001·▶ ai·07:44:39

not much happened today

Alibaba released Qwen3.6-27B, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench…

via news.smol.ai

Monday, April 20, 2026’s editionMonday, April 20, 2026

N° 001·▶ ai·07:44:39

not much happened today

Moonshot's Kimi K2.6 is a major open-weight 1T-parameter MoE model featuring 32B active parameters, 384 experts, MLA attention, 256K context window, native multimodality, and INT4 quantization. It supports day-0…

via news.smol.ai

Friday, April 3, 2026’s editionFriday, April 3, 2026

N° 001·▶ ai·04:19:12

vLLM v0.19.0

vLLM v0.19.0 Highlights This release features 448 commits from 197 contributors (54 new)! Gemma 4 support: Full Google Gemma 4 architecture support including MoE, multimodal, reasoning, and tool-use capabilities…

Thursday, April 2, 2026’s editionThursday, April 2, 2026

N° 001·▶ ollama·18:54:22

Ollama v0.20.0

Gemma 4 Effective 2B (E2B) ``` ollama run gemma4:e2b ``` Effective 4B (E4B) ``` ollama run gemma4:e4b ``` 26B (Mixture of Experts model with 4B active parameters) ``` ollama run gemma4:26b ``` 31B (Dense) ``` ollama…

* sponsored·▶ nimbus

Need an agent shipped this quarter?

Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.

Nimbus — talk to Nimbus →

Wednesday, February 25, 2026’s editionWednesday, February 25, 2026

N° 001·▶ ai·20:58:49

vLLM v0.16.0

vLLM v0.16.0 Please note that this release was branch cut on Feb 8, so any features added to vLLM after that date is not included. Highlights This release features 440 commits from 203 contributors (7 new)! Async…

Tuesday, January 20, 2026’s editionTuesday, January 20, 2026

N° 001·▶ ai·10:20:31

vLLM v0.14.0

Highlights This release features approximately 660 commits from 251 contributors (86 new contributors). Breaking Changes: Async scheduling is now enabled by default - Users who experience issues can disable with…

Friday, December 26, 2025’s editionFriday, December 26, 2025

N° 001·▶ agents·06:44:39

not much happened today

MiniMax M2.1 launches as an open-source agent and coding Mixture-of-Experts (MoE) model with ~10B active / ~230B total parameters, claiming to outperform Gemini 3 Pro and Claude Sonnet 4.5, and supports local inference…

via news.smol.ai

Tuesday, December 23, 2025’s editionTuesday, December 23, 2025

N° 001·▶ ai·06:44:39

not much happened today

GLM-4.7 and MiniMax M2.1 open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an…

via news.smol.ai

Wednesday, December 3, 2025’s editionWednesday, December 3, 2025

N° 001·▶ ai·10:36:17

vLLM v0.12.0

vLLM v0.12.0 Release Notes Highlights Highlights This release features 474 commits from 213 contributors (57 new)！ Breaking Changes: This release includes PyTorch 2.9.0 upgrade (CUDA 12.9), V0 deprecations including…

Wednesday, November 19, 2025’s editionWednesday, November 19, 2025

N° 001·▶ ai·00:03:42

vLLM v0.11.1

Highlights This release includes 1456 commits from 449 contributors (184 new contributors)! Key changes include: PyTorch 2.9.0 + CUDA 12.9.1: Updated the default CUDA build to `torch==2.9.0+cu129`, enabling Inductor…

Thursday, October 2, 2025’s editionThursday, October 2, 2025

N° 001·▶ ai·21:17:04

vLLM v0.11.0

Highlights This release features 538 commits, 207 contributors (65 new contributors)! This release completes the removal of V0 engine. V0 engine code including AsyncLLMEngine, LLMEngine, MQLLMEngine, all attention…

Friday, May 8, 2026’s editionFriday, May 8, 2026

N° 001·▶ deepseek·02:00:00

Serving DeepSeek-V4: why million-token context is an inference systems problem

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint…

via together.ai

Monday, May 4, 2026’s editionMonday, May 4, 2026

N° 001·▶ ai·12:36:26

vLLM v0.20.1

vLLM v0.20.1 This is a patch release on top of `v0.20.0` primarily focused on DeepSeek V4 stabilization and performance improvements, along with several important bug fixes. DeepSeek V4 Base model support (#41006)…

Tuesday, April 28, 2026’s editionTuesday, April 28, 2026

N° 001·▶ ai·20:32:50

Transformers v5.7.0

Release v5.7.0 New Model additions Laguna Laguna is Poolside's mixture-of-experts language model family that extends standard SwiGLU MoE transformers with two key innovations. It features per-layer head counts allowing…

Friday, March 27, 2026’s editionFriday, March 27, 2026

N° 001·▶ ollama·18:07:30

Ollama v0.19.0

Ollama is now powered by MLX on Apple Silicon in preview Ollama on Apple silicon is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture…

Friday, March 20, 2026’s editionFriday, March 20, 2026

N° 001·▶ ai·22:31:36

vLLM v0.18.0

vLLM v0.18.0 Known issues Degraded accuracy when serving Qwen3.5 with FP8 KV cache on B200 (#37618) If you previously ran into `CUBLAS_STATUS_INVALID_VALUE` and had to use a workaround in `v0.17.0`, you can reinstall…

Saturday, March 7, 2026’s editionSaturday, March 7, 2026

N° 001·▶ ai·01:46:41

vLLM v0.17.0

vLLM v0.17.0 Known Issue: If you are on CUDA 12.9+ and encounter a `CUBLAS_STATUS_INVALID_VALUE` error, this is caused by a CUDA library mismatch. To resolve, try one of the following: 1. Remove the path to system CUDA…

Thursday, January 29, 2026’s editionThursday, January 29, 2026

N° 001·▶ ai·11:21:01

vLLM v0.15.0

Highlights This release features 335 commits from 158 contributors (39 new)! Model Support New architectures: Kimi-K2.5 (#33131), Molmo2 (#30997), Step3vl 10B (#32329), Step1 (#32511), GLM-Lite (#31386), Eagle2.5-8B…

Wednesday, December 31, 2025’s editionWednesday, December 31, 2025

N° 001·▶ ai·06:44:39

not much happened today

South Korea's Ministry of Science launched a coordinated program with 5 companies to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like SK Telecom A.X-K1 (519B total / 33B…

via news.smol.ai

Friday, December 19, 2025’s editionFriday, December 19, 2025

N° 001·▶ ai·04:02:22

vLLM v0.13.0

vLLM v0.13.0 Release Notes Highlights Highlights This release features 442 commits from 207 contributors (61 new contributors)! Breaking Changes: This release includes deprecation removals, PassConfig flag renames, and…

* sponsored·▶ nimbus

Need an agent shipped this quarter?

Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.

Nimbus — talk to Nimbus →

Wednesday, December 10, 2025’s editionWednesday, December 10, 2025

N° 001·▶ agents·06:44:39

not much happened today

NousResearch's Nomos 1 is a 30B open math model achieving a top Putnam score with only ~3B active parameters, enabling consumer Mac inference. AxiomProver also posts top Putnam results using ThinkyMachines' RL stack…

via news.smol.ai

Saturday, September 13, 2025’s editionSaturday, September 13, 2025

N° 001·▶ ai·08:37:01

vLLM v0.10.2

Highlights This release contains 740 commits from 266 contributors (97 new)! Breaking Changes: This release includes PyTorch 2.8.0 upgrade, V0 deprecations, and API changes - please review the changelog carefully…

Yesterday’s editionTuesday, May 12, 2026

N° 001·▶ llama.cpp·01:24:46

llama.cpp b9112

CUDA: handle OW > 65535 in im2col (2D and 3D) (#22944) `im2col_cuda` and `im2col_3d_cuda` both dispatch with `block_nums.y = OW`. CUDA caps grid Y at 65535. Conv1d encoders on raw 16 kHz audio with T > 65535 (~ 4 s)…

Monday, May 11, 2026’s editionMonday, May 11, 2026

N° 001·▶ llama.cpp·23:12:24

llama.cpp b9109

spec : parallel drafting support (#22838) spec : refactor spec : drop support for incompatible vocabs spec : update common_speculative_init() cont : pass seq_id cont : dedup ctx_seq_rm_type server : sketch the ctx_dft…

Tuesday, May 5, 2026’s editionTuesday, May 5, 2026

N° 001·▶ ollama·19:13:31

Ollama v0.23.1

Gemma 4 MTP (Multi-token Processing) for the MLX runner Gemma 4 MTP speculative decoding is now supported on Macs. This can give over a 2x speed increase for the Gemma 4 31B model on coding tasks. ``` ollama run…

Tuesday, April 28, 2026’s editionTuesday, April 28, 2026

N° 001·▶ ollama·17:00:25

Ollama v0.22.0

New models NVIDIA's Nemotron 3 Omni Poolside's first open-weight coding model - Laguna XS.2 Full Changelog: https://github.com/ollama/ollama/compare/v0.21.2...v0.22.0