§ feed · cluster
Meta and Stanford propose transformer cutting inference memory by 50%
Meta, Stanford, and University of Washington researchers propose methods to accelerate Byte Latent Transformer (BLT) generation, reducing inference memory bandwidth by over 50% without tokenization using diffusion and verification techniques.
§ sources1 publication · timeline below