Table of Contents
Fetching ...

Segmented Exponent Alignment and Dynamic Wordline Activation for Floating-Point Analog CIM Macros

Weiping Yang, Shilin Zhou, Hui Xu, Jiawei Xue, Changlin Chen

TL;DR

The paper tackles FP-CIM overhead by addressing exponent comparison and mantissa alignment in FP-MACs. It introduces Segmented Exponent Alignment (SEA), which partitions the exponent space into three regions based on the top 3 MSBs to avoid global max-exponent detection and reduce mantissa shifts, and Dynamic Wordline Activation (DWA), which activates wordlines by SEA-defined groups to cut bit-serial input cycles. Integrated into a 28 nm analog FP-CIM, SEA and DWA achieve significant improvements, including up to 63.8% power savings, 58.6% area reduction, and about 40.87% latency reduction on VGG16-CIFAR10 with ~2% accuracy loss, validated against state-of-the-art designs. These results suggest substantial practical benefits for FP16 CNN workloads by reducing exponent-handling overhead and input latency in compute-in-memory architectures.

Abstract

With the rise of compute-in-memory (CIM) accelerators, floating-point multiply-and-accumulate (FP-MAC) operations have gained extensive attention for their higher accuracy over integer MACs in neural networks. However, the hardware overhead caused by exponent comparison and mantissa alignment, along with the delay introduced by bit-serial input methods, remains a hinder to implement FP-MAC efficiently. In view of this, we propose Segmented Exponent Alignment (SEA) and Dynamic Wordline Activation (DWA) strategies. SEA exploits the observation that input exponents are often clustered around zero or within a narrow range. By segmenting the exponent space and aligning mantissas accordingly, SEA eliminates the need for maximum exponent detection and reduces input mantissa shifting, and thus reduces the processing latency. DWA further reduces latency and maintains accuracy by activating wordlines based on the exponent segments defined by SEA. Simulation results demonstrate that, when compared with conventional comparison tree based maximum exponent alignment method, our approach saves 63.8\% power consumption, and achieves a 40.87\% delay reduction on the VGG16-CIFAR10 benchmark.

Segmented Exponent Alignment and Dynamic Wordline Activation for Floating-Point Analog CIM Macros

TL;DR

The paper tackles FP-CIM overhead by addressing exponent comparison and mantissa alignment in FP-MACs. It introduces Segmented Exponent Alignment (SEA), which partitions the exponent space into three regions based on the top 3 MSBs to avoid global max-exponent detection and reduce mantissa shifts, and Dynamic Wordline Activation (DWA), which activates wordlines by SEA-defined groups to cut bit-serial input cycles. Integrated into a 28 nm analog FP-CIM, SEA and DWA achieve significant improvements, including up to 63.8% power savings, 58.6% area reduction, and about 40.87% latency reduction on VGG16-CIFAR10 with ~2% accuracy loss, validated against state-of-the-art designs. These results suggest substantial practical benefits for FP16 CNN workloads by reducing exponent-handling overhead and input latency in compute-in-memory architectures.

Abstract

With the rise of compute-in-memory (CIM) accelerators, floating-point multiply-and-accumulate (FP-MAC) operations have gained extensive attention for their higher accuracy over integer MACs in neural networks. However, the hardware overhead caused by exponent comparison and mantissa alignment, along with the delay introduced by bit-serial input methods, remains a hinder to implement FP-MAC efficiently. In view of this, we propose Segmented Exponent Alignment (SEA) and Dynamic Wordline Activation (DWA) strategies. SEA exploits the observation that input exponents are often clustered around zero or within a narrow range. By segmenting the exponent space and aligning mantissas accordingly, SEA eliminates the need for maximum exponent detection and reduces input mantissa shifting, and thus reduces the processing latency. DWA further reduces latency and maintains accuracy by activating wordlines based on the exponent segments defined by SEA. Simulation results demonstrate that, when compared with conventional comparison tree based maximum exponent alignment method, our approach saves 63.8\% power consumption, and achieves a 40.87\% delay reduction on the VGG16-CIFAR10 benchmark.

Paper Structure

This paper contains 7 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: (a) FP-MAC procedure and (b) Mantissa alignment
  • Figure 2: Illustration of FP computation process and Fixed Width Input (FWI) method in FP-CIM
  • Figure 3: (a) Distribution of input exponents in VGG16-CIFAR10 & ResNet18-CIFAR10 (b) Proposed 3 MSB-based categories
  • Figure 4: (a) Conventional strategy with FWI (b) Proposed SEA and DWA strategies with FWI
  • Figure 5: (a) In-order wordline (WL) operation (b) Proposed Dynamic Wordline Activation Strategy (DWA)
  • ...and 3 more figures