Table of Contents
Fetching ...

ReGate: Enabling Power Gating in Neural Processing Units

Yuqi Xue, Jian Huang

TL;DR

ReGate addresses the large static-power waste in neural processing units by introducing a hardware–software co-design that enables fine-grained power gating across NPU components. It combines per-component hardware gating (for SAs, ICI, and HBM controllers) with software-managed strategies (for VUs and SRAM) and extends the NPU ISA with setpm instructions to orchestrate power states. Empirical evaluation on a production-like NPU simulator shows substantial energy savings (average ~15.5%, up to 32.8%) with negligible performance impact, and significant reductions in operational carbon at scale. The work demonstrates a practical, extensible pathway to democratize power efficiency in AI accelerators and informs broader design choices for sustainable data-center NPUs.

Abstract

The energy efficiency of neural processing units (NPU) is playing a critical role in developing sustainable data centers. Our study with different generations of NPU chips reveals that 30%-72% of their energy consumption is contributed by static power dissipation, due to the lack of power management support in modern NPU chips. In this paper, we present ReGate, which enables fine-grained power-gating of each hardware component in NPU chips with hardware/software co-design. Unlike conventional power-gating techniques for generic processors, enabling power-gating in NPUs faces unique challenges due to the fundamental difference in hardware architecture and program execution model. To address these challenges, we carefully investigate the power-gating opportunities in each component of NPU chips and decide the best-fit power management scheme (i.e., hardware- vs. software-managed power gating). Specifically, for systolic arrays (SAs) that have deterministic execution patterns, ReGate enables cycle-level power gating at the granularity of processing elements (PEs) following the inherent dataflow execution in SAs. For inter-chip interconnect (ICI) and HBM controllers that have long idle intervals, ReGate employs a lightweight hardware-based idle-detection mechanism. For vector units and SRAM whose idle periods vary significantly depending on workload patterns, ReGate extends the NPU ISA and allows software like compilers to manage the power gating. With implementation on a production-level NPU simulator, we show that ReGate can reduce the energy consumption of NPU chips by up to 32.8% (15.5% on average), with negligible impact on AI workload performance. The hardware implementation of power-gating logic introduces less than 3.3% overhead in NPU chips.

ReGate: Enabling Power Gating in Neural Processing Units

TL;DR

ReGate addresses the large static-power waste in neural processing units by introducing a hardware–software co-design that enables fine-grained power gating across NPU components. It combines per-component hardware gating (for SAs, ICI, and HBM controllers) with software-managed strategies (for VUs and SRAM) and extends the NPU ISA with setpm instructions to orchestrate power states. Empirical evaluation on a production-like NPU simulator shows substantial energy savings (average ~15.5%, up to 32.8%) with negligible performance impact, and significant reductions in operational carbon at scale. The work demonstrates a practical, extensible pathway to democratize power efficiency in AI accelerators and informs broader design choices for sustainable data-center NPUs.

Abstract

The energy efficiency of neural processing units (NPU) is playing a critical role in developing sustainable data centers. Our study with different generations of NPU chips reveals that 30%-72% of their energy consumption is contributed by static power dissipation, due to the lack of power management support in modern NPU chips. In this paper, we present ReGate, which enables fine-grained power-gating of each hardware component in NPU chips with hardware/software co-design. Unlike conventional power-gating techniques for generic processors, enabling power-gating in NPUs faces unique challenges due to the fundamental difference in hardware architecture and program execution model. To address these challenges, we carefully investigate the power-gating opportunities in each component of NPU chips and decide the best-fit power management scheme (i.e., hardware- vs. software-managed power gating). Specifically, for systolic arrays (SAs) that have deterministic execution patterns, ReGate enables cycle-level power gating at the granularity of processing elements (PEs) following the inherent dataflow execution in SAs. For inter-chip interconnect (ICI) and HBM controllers that have long idle intervals, ReGate employs a lightweight hardware-based idle-detection mechanism. For vector units and SRAM whose idle periods vary significantly depending on workload patterns, ReGate extends the NPU ISA and allows software like compilers to manage the power gating. With implementation on a production-level NPU simulator, we show that ReGate can reduce the energy consumption of NPU chips by up to 32.8% (15.5% on average), with negligible impact on AI workload performance. The hardware implementation of power-gating logic introduces less than 3.3% overhead in NPU chips.

Paper Structure

This paper contains 39 sections, 25 figures, 4 tables.

Figures (25)

  • Figure 1: Architecture of an NPU chip.
  • Figure 2: Energy efficiency of ML workloads on different NPU generations. For some NPU generations that cannot satisfy 1$\times$ SLO, we report the energy efficiency for the best relaxed SLO target they can achieve, and the attainable SLOs are labeled on top of the bars (e.g., "2$\times$").
  • Figure 3: Energy consumption breakdown of ML workloads on different NPU generations.
  • Figure 4: SA temporal utilization (numbers $\leq$0.1% are rounded to 0).
  • Figure 5: SA spatial utilization quantified by the achieved FLOPs over the theoretical peak FLOPs during SA active time.
  • ...and 20 more figures