Segmented Exponent Alignment and Dynamic Wordline Activation for Floating-Point Analog CIM Macros
Weiping Yang, Shilin Zhou, Hui Xu, Jiawei Xue, Changlin Chen
TL;DR
The paper tackles FP-CIM overhead by addressing exponent comparison and mantissa alignment in FP-MACs. It introduces Segmented Exponent Alignment (SEA), which partitions the exponent space into three regions based on the top 3 MSBs to avoid global max-exponent detection and reduce mantissa shifts, and Dynamic Wordline Activation (DWA), which activates wordlines by SEA-defined groups to cut bit-serial input cycles. Integrated into a 28 nm analog FP-CIM, SEA and DWA achieve significant improvements, including up to 63.8% power savings, 58.6% area reduction, and about 40.87% latency reduction on VGG16-CIFAR10 with ~2% accuracy loss, validated against state-of-the-art designs. These results suggest substantial practical benefits for FP16 CNN workloads by reducing exponent-handling overhead and input latency in compute-in-memory architectures.
Abstract
With the rise of compute-in-memory (CIM) accelerators, floating-point multiply-and-accumulate (FP-MAC) operations have gained extensive attention for their higher accuracy over integer MACs in neural networks. However, the hardware overhead caused by exponent comparison and mantissa alignment, along with the delay introduced by bit-serial input methods, remains a hinder to implement FP-MAC efficiently. In view of this, we propose Segmented Exponent Alignment (SEA) and Dynamic Wordline Activation (DWA) strategies. SEA exploits the observation that input exponents are often clustered around zero or within a narrow range. By segmenting the exponent space and aligning mantissas accordingly, SEA eliminates the need for maximum exponent detection and reduces input mantissa shifting, and thus reduces the processing latency. DWA further reduces latency and maintains accuracy by activating wordlines based on the exponent segments defined by SEA. Simulation results demonstrate that, when compared with conventional comparison tree based maximum exponent alignment method, our approach saves 63.8\% power consumption, and achieves a 40.87\% delay reduction on the VGG16-CIFAR10 benchmark.
