Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training

Simla Burcu Harma; Ayan Chakraborty; Nicholas Sperry; Babak Falsafi; Martin Jaggi; Yunho Oh

Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training

Simla Burcu Harma, Ayan Chakraborty, Nicholas Sperry, Babak Falsafi, Martin Jaggi, Yunho Oh

TL;DR

The paper tackles the high resource cost of DNN training by systematically exploring HBFP as a single-level scaling format and introducing Accuracy Booster, a mixed-mantissa approach that uses $4$-bit mantissas for the vast majority of training operations and $6$-bit mantissas in the final epoch and in the first/last layers. Using Wasserstein-distance analysis and loss landscapes, the authors show that HBFP with a fixed exponent width of $8$ bits requires at least $6$-bit mantissas to rival FP32 accuracy across models, while a targeted mixed-mantissa strategy can recover FP32-level performance with only $4$-bit mantissas in most places. Accuracy Booster achieves substantial arithmetic-density gains—up to $25\times$ over FP32 and $2.3\times$ over MXFP$6$—while maintaining state-of-the-art accuracies on CNNs and competitive results on Transformer models, and reduces per-element storage by about $33\%$. These results offer a practical pathway to high-density, energy-efficient DNN training and inform hardware-software co-design for fixed-point training.

Abstract

The unprecedented demand for computing resources to train DNN models has led to a search for minimal numerical encoding. Recent state-of-the-art (SOTA) proposals advocate for multi-level scaled narrow bitwidth numerical formats. In this paper, we show that single-level scaling is sufficient to maintain training accuracy while maximizing arithmetic density. We identify a previously proposed single-level scaled format for 8-bit training, Hybrid Block Floating Point (HBFP), as the optimal candidate to minimize. We perform a full-scale exploration of the HBFP design space using mathematical tools to study the interplay among various parameters and identify opportunities for even smaller encodings across layers and epochs. Based on our findings, we propose Accuracy Booster, a mixed-mantissa HBFP technique that uses 4-bit mantissas for over 99% of all arithmetic operations in training and 6-bit mantissas only in the last epoch and first/last layers. We show Accuracy Booster enables increasing arithmetic density over all other SOTA formats by at least 2.3x while achieving state-of-the-art accuracies in 4-bit training.

Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training

TL;DR

The paper tackles the high resource cost of DNN training by systematically exploring HBFP as a single-level scaling format and introducing Accuracy Booster, a mixed-mantissa approach that uses

-bit mantissas for the vast majority of training operations and

-bit mantissas in the final epoch and in the first/last layers. Using Wasserstein-distance analysis and loss landscapes, the authors show that HBFP with a fixed exponent width of

bits requires at least

-bit mantissas to rival FP32 accuracy across models, while a targeted mixed-mantissa strategy can recover FP32-level performance with only

-bit mantissas in most places. Accuracy Booster achieves substantial arithmetic-density gains—up to

over FP32 and

over MXFP

—while maintaining state-of-the-art accuracies on CNNs and competitive results on Transformer models, and reduces per-element storage by about

. These results offer a practical pathway to high-density, energy-efficient DNN training and inform hardware-software co-design for fixed-point training.

Abstract

Paper Structure (22 sections, 2 equations, 8 figures, 7 tables)

This paper contains 22 sections, 2 equations, 8 figures, 7 tables.

Introduction
Arithmetic & storage density
Minimizing HBFP
What factors affect accuracy?
Wasserstein distance & loss landscapes
Experimental results
Baseline Single-Mantissa HBFP
Accuracy Booster
Related work
Discussion
Hardware Implications.
Limitations.
Impact.
Conclusion
HBFP5 accuracies
...and 7 more sections

Figures (8)

Figure 1: An HBFP systolic array accelerator.
Figure 2: Normalized arithmetic density.
Figure 3: Wasserstein distance relative to FP32.
Figure 4: Loss landscapes for ResNet20.
Figure 5: Top1 Accuracy training curves for ResNet74 and DenseNet40 on CIFAR100
...and 3 more figures

Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training

TL;DR

Abstract

Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training

Authors

TL;DR

Abstract

Table of Contents

Figures (8)