09:11 CETWednesday · May 13, 2026

shipfeed

⌘K SEARCHJK NAVO OPEN

on the wire

08:00:18arXiv — cs.AIMedHopQA benchmark tests LLM reasoning in biomedical Q&A◆08:00:18arXiv — cs.CLRouters learn geometry of sparse mixture-of-experts◆08:00:18arXiv — cs.AIStudy audits how LLMs generate political discourse during crises◆08:00:18arXiv — cs.AIClassifier Context Rot: Monitor Performance Degrades with Context◆08:00:18arXiv — cs.AIExecutable Agentic Memory for GUI Agent◆08:00:18arXiv — cs.CLLongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced◆08:00:18arXiv — cs.AISparse-to-dense rewards improve language model post-training◆08:00:18arXiv — cs.AIAI-native mobility dataset advances 6G handover and beam management◆08:00:18arXiv — cs.AIMedHopQA benchmark tests LLM reasoning in biomedical Q&A◆08:00:18arXiv — cs.CLRouters learn geometry of sparse mixture-of-experts◆08:00:18arXiv — cs.AIStudy audits how LLMs generate political discourse during crises◆08:00:18arXiv — cs.AIClassifier Context Rot: Monitor Performance Degrades with Context◆08:00:18arXiv — cs.AIExecutable Agentic Memory for GUI Agent◆08:00:18arXiv — cs.CLLongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced◆08:00:18arXiv — cs.AISparse-to-dense rewards improve language model post-training◆08:00:18arXiv — cs.AIAI-native mobility dataset advances 6G handover and beam management◆

home/§ local-llm/cluster

ad slot opena single understated line lives here — sponsor wordmark + a short line.advertise on shipfeed →

§ local-llm · cluster

Serving DeepSeek-V4: why million-token context is an inference systems problem

May 8 · 02:00:00 · primary fetch1 sourcecluster 2d70e8eaupdated May 8 · 02:00:00

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-context workloads.

read full article on together.ai ↗

§ sources1 publication · timeline below

together.aiServing DeepSeek-V4: why million-token context is an inference systems problemprimary02:00:00

Serving DeepSeek-V4: why million-token context is an inference systems problem · shipfeed