Table of Contents
Fetching ...

GAVINA: flexible aggressive undervolting for bit-serial mixed-precision DNN acceleration

Jordi Fornt, Pau Fontova-Musté, Adrian Gras, Omar Lahyani, Martí Caro, Jaume Abella, Francesc Moll, Josep Altet

TL;DR

The paper tackles the energy efficiency challenge in DNN hardware by combining Guarded Aggressive undervolting (GAV) with bit-serial, mixed-precision computation to enable flexible, low-power operation. It introduces the GAVINA accelerator, detailing its architecture, a two-domain voltage strategy, and a loss-aware error model that supports per-layer voltage optimization. Through a physical 12 nm design, an undervolting model calibrated to GLS data, and CIFAR-10/ResNet-18 benchmarks, the work demonstrates up to 89 TOP/sW energy efficiency and up to 20% energy savings with negligible accuracy loss. It also positions GAVINA favorably against state-of-the-art, highlighting the benefits of combining mixed precision, bit-serial computation, and controlled undervolting for scalable, energy-efficient DNN inference.

Abstract

Voltage overscaling, or undervolting, is an enticing approximate technique in the context of energy-efficient Deep Neural Network (DNN) acceleration, given the quadratic relationship between power and voltage. Nevertheless, its very high error rate has thwarted its general adoption. Moreover, recent undervolting accelerators rely on 8-bit arithmetic and cannot compete with state-of-the-art low-precision (<8b) architectures. To overcome these issues, we propose a new technique called Guarded Aggressive underVolting (GAV), which combines the ideas of undervolting and bit-serial computation to create a flexible approximation method based on aggressively lowering the supply voltage on a select number of least significant bit combinations. Based on this idea, we implement GAVINA (GAV mIxed-precisioN Accelerator), a novel architecture that supports arbitrary mixed precision and flexible undervolting, with an energy efficiency of up to 89 TOP/sW in its most aggressive configuration. By developing an error model of GAVINA, we show that GAV can achieve an energy efficiency boost of 20% via undervolting, with negligible accuracy degradation on ResNet-18.

GAVINA: flexible aggressive undervolting for bit-serial mixed-precision DNN acceleration

TL;DR

The paper tackles the energy efficiency challenge in DNN hardware by combining Guarded Aggressive undervolting (GAV) with bit-serial, mixed-precision computation to enable flexible, low-power operation. It introduces the GAVINA accelerator, detailing its architecture, a two-domain voltage strategy, and a loss-aware error model that supports per-layer voltage optimization. Through a physical 12 nm design, an undervolting model calibrated to GLS data, and CIFAR-10/ResNet-18 benchmarks, the work demonstrates up to 89 TOP/sW energy efficiency and up to 20% energy savings with negligible accuracy loss. It also positions GAVINA favorably against state-of-the-art, highlighting the benefits of combining mixed precision, bit-serial computation, and controlled undervolting for scalable, energy-efficient DNN inference.

Abstract

Voltage overscaling, or undervolting, is an enticing approximate technique in the context of energy-efficient Deep Neural Network (DNN) acceleration, given the quadratic relationship between power and voltage. Nevertheless, its very high error rate has thwarted its general adoption. Moreover, recent undervolting accelerators rely on 8-bit arithmetic and cannot compete with state-of-the-art low-precision (<8b) architectures. To overcome these issues, we propose a new technique called Guarded Aggressive underVolting (GAV), which combines the ideas of undervolting and bit-serial computation to create a flexible approximation method based on aggressively lowering the supply voltage on a select number of least significant bit combinations. Based on this idea, we implement GAVINA (GAV mIxed-precisioN Accelerator), a novel architecture that supports arbitrary mixed precision and flexible undervolting, with an energy efficiency of up to 89 TOP/sW in its most aggressive configuration. By developing an error model of GAVINA, we show that GAV can achieve an energy efficiency boost of 20% via undervolting, with negligible accuracy degradation on ResNet-18.

Paper Structure

This paper contains 10 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Summary of digital state-of-the-art DNN accelerators colonnadewang22fujiwara22bitbladecutiemarsellusopengemmrapidtimdnnjang24. UV stands for undervolting. *Note: results for Shin_2019 only include the MAC array.
  • Figure 2: GAV schedule based on a single variable G and two voltage levels.
  • Figure 3: GAVINA architecture diagram and an example of multi-bit integer GEMM using GAV. In the example, $bit_A$ and $bit_B$ are the control signals that index the bit significance positions of the activation and weight matrices, respectively.
  • Figure 4: Post-layout floorplan (a) and power distribution of GAVINA for different precision configurations (b), without undervolting (using $V_{guard}$).
  • Figure 5: Summary of our experimental methodology for error and energy efficiency estimation using GLS.
  • ...and 3 more figures