Microscaling Floating Point Formats for Large Language Models
Marco Cococcioni, Dario Pagani, Federico Rossi
TL;DR
This work tackles the resource bottlenecks of large language models by adopting Microscaling, a block-based 8-bit floating-point approach that shares a single scale per block to extend dynamic range. It delivers a flexible C++23 implementation with a generic data-format interface, including an exact-accumulator option and LUT-backed arithmetic, enabling both training and inference under mixed-precision regimes. The approach is validated via GPT-2 experiments, showing that Microscaling can maintain competitive accuracy while reducing memory and compute, albeit with careful attention to rounding, operation ordering, and softmax stability. With future hardware support for low-bit formats, Microscaling has the potential to substantially accelerate LLM training and deployment at scale.
Abstract
The increasing computational and memory demands of large language models (LLMs) necessitate innovative approaches to optimize resource usage without compromising performance. This paper leverages microscaling floating-point formats, a novel technique designed to address these challenges by reducing the storage and computational overhead associated with numerical representations in LLMs. Unlike traditional floating-point representations that allocate a dedicated scale for each value, microscaling employs a shared scale across a block of values, enabling compact one-byte floating-point representations while maintaining an extended dynamic range. We explore the application of microscaling in the context of 8-bit floating-point formats to significantly reduce memory footprint and computational costs. We tested several configurations of microscaling floats within the GPT-2 LLM architecture, demonstrating that microscaling data formats can achieve competitive accuracy during training and inference, proving its efficacy as a resource-efficient alternative for deploying LLMs at scale. The source code is publicly available at: https://github.com/unipi-dii-compressedarith/llm.c-sve
