09:11 CETWednesday · May 13, 2026

shipfeed

K SEARCHJK NAVO OPEN
on the wire
home/§ tools/cluster
ad slot opena single understated line lives here — sponsor wordmark + a short line.advertise on shipfeed →
§ tools · cluster

Transformers v5.5.0

Apr 2 · · primary fetch1 sourcecluster eb01ca68updated Apr 2 ·

Release v5.5.0 New Model additions Gemma4 Gemma 4 is a multimodal model with pretrained and instruction-tuned variants, available in 1B, 13B, and 27B parameters. The architecture is mostly the same as the previous Gemma versions. The key differences are a vision processor that can output images of fixed token budget and a spatial 2D RoPE to encode vision-specific information across height and width axis. You can find all the original Gemma 4 checkpoints under the Gemma 4 release. The key difference from previous Gemma releases is the new design to process images of different sizes using a fixed-budget number of tokens.

Unlike many models that squash every image into a fixed square (like 224×224), Gemma 4 keeps the image's natural aspect ratio while making it the right size. There a a couple constraints to follow: The total number of pixels must fit within a patch budget Both height and width must be divisible by 48 (= patch size 16 × pooling kernel 3) [!IMPORTANT] Gemma 4 does not apply the standard ImageNet mean/std normalization that many other vision models use. The model's own patch embedding layer handles the final scaling internally (shifting values to the [-1, 1] range)…

read full article on github.com
§ sources1 publication · timeline below
  1. github.comtransformers v5.5.0primary
Transformers v5.5.0 · shipfeed