Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

Liang Zhao; Kunming Shao; Zhipeng Liao; Xijie Huang; Tim Kwang-Ting Cheng; Chi-Ying Tsui; Yi Zou

Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

Liang Zhao, Kunming Shao, Zhipeng Liao, Xijie Huang, Tim Kwang-Ting Cheng, Chi-Ying Tsui, Yi Zou

TL;DR

This work tackles efficient FP8 inference/training on digital compute-in-memory by enabling variable aligned-mantissa bitwidths across FP8 formats. It introduces a software-hardware co-design comprising Dynamic Shift-aware Bitwidth Prediction (DSBP), a Mantissa Prediction Unit (MPU), a FIFO-based Input Alignment Unit (FIAU), and a precision-scalable INT MAC array, all implemented in 28nm with a 64×96 CIM array. The architecture demonstrates 20.4 TFLOPS/W for fixed $E5M7$ and up to 2.8× higher FP8 efficiency than prior FP-CIM work while supporting all FP8 formats from $E2M5$ to $E5M2$, validated on Llama-7b (BoolQ, Winogrande) and ResNet18/ImageNet. Results show that on-the-fly DSBP maintains accuracy while delivering flexible accuracy–efficiency trade-offs, highlighting the benefits of software-hardware co-design for variable-mantissa FP8 computation in digital CIM.

Abstract

FP8 low-precision formats have gained significant adoption in Transformer inference and training. However, existing digital compute-in-memory (DCIM) architectures face challenges in supporting variable FP8 aligned-mantissa bitwidths, as unified alignment strategies and fixed-precision multiply-accumulate (MAC) units struggle to handle input data with diverse distributions. This work presents a flexible FP8 DCIM accelerator with three innovations: (1) a dynamic shift-aware bitwidth prediction (DSBP) with on-the-fly input prediction that adaptively adjusts weight (2/4/6/8b) and input (2$\sim$12b) aligned-mantissa precision; (2) a FIFO-based input alignment unit (FIAU) replacing complex barrel shifters with pointer-based control; and (3) a precision-scalable INT MAC array achieving flexible weight precision with minimal overhead. Implemented in 28nm CMOS with a 64$\times$96 CIM array, the design achieves 20.4 TFLOPS/W for fixed E5M7, demonstrating 2.8$\times$ higher FP8 efficiency than previous work while supporting all FP8 formats. Results on Llama-7b show that the DSBP achieves higher efficiency than fixed bitwidth mode at the same accuracy level on both BoolQ and Winogrande datasets, with configurable parameters enabling flexible accuracy-efficiency trade-offs.

Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

TL;DR

and up to 2.8× higher FP8 efficiency than prior FP-CIM work while supporting all FP8 formats from

, validated on Llama-7b (BoolQ, Winogrande) and ResNet18/ImageNet. Results show that on-the-fly DSBP maintains accuracy while delivering flexible accuracy–efficiency trade-offs, highlighting the benefits of software-hardware co-design for variable-mantissa FP8 computation in digital CIM.

Abstract

12b) aligned-mantissa precision; (2) a FIFO-based input alignment unit (FIAU) replacing complex barrel shifters with pointer-based control; and (3) a precision-scalable INT MAC array achieving flexible weight precision with minimal overhead. Implemented in 28nm CMOS with a 64

96 CIM array, the design achieves 20.4 TFLOPS/W for fixed E5M7, demonstrating 2.8

higher FP8 efficiency than previous work while supporting all FP8 formats. Results on Llama-7b show that the DSBP achieves higher efficiency than fixed bitwidth mode at the same accuracy level on both BoolQ and Winogrande datasets, with configurable parameters enabling flexible accuracy-efficiency trade-offs.

Paper Structure (11 sections, 1 equation, 8 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 1 equation, 8 figures, 2 tables, 1 algorithm.

Introduction
Proposed Design
Dynamic Shift-aware Bitwidth Prediction (DSBP)
Mantissa Prediction Unit (MPU) Design
FIFO-based Input Alignment Unit (FIAU)
Flexible Precision Scaling INT MAC Array
Validations and Evaluations
Evaluations of Bitwidth and Accuracy
Evaluations of the Proposed Hardware Architecture
Comparison with Previous Works
Conclusion

Figures (8)

Figure 1: (a) FP8 parameters extracted from Llama-7b with different format. (b) Requirement of variable-mantissa computation based on FP-DCIM.
Figure 2: Overall framework of our software-hardware co-design Variable-Mantissa FP8 DCIM accelerator.
Figure 3: The schematic of the proposed MPU.
Figure 4: FIAU achieves alignment by controlling pointer movement.
Figure 5: The schematic of the adder tree and fusion unit.
...and 3 more figures

Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

TL;DR

Abstract

Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (8)