PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI Inference

Paul Chen; Jeongeun Kim; Wenbo Zhu; Yuanhan Li; Shunyao Huang; Chenjie Weng; Christopher Torng

PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI Inference

Paul Chen, Jeongeun Kim, Wenbo Zhu, Yuanhan Li, Shunyao Huang, Chenjie Weng, Christopher Torng

Abstract

Edge AI systems often operate under stringent energy and volume constraints that demand extreme efficiency under limited battery capacity, with requirements worsening as intelligent capability demands advance. Prior literature suggests that fine-grained power orchestration, including DVFS and power gating, enables significant energy efficiency benefits that cannot be left unexploited, while still exhibiting unexplored challenges. We observe that layer-level approaches incur unintended overheads due to inter-layer coupling of power control decisions, and that jointly managing these mechanisms under practical constraints such as limited voltage rails and transition overheads leads to a rapidly growing combinatorial schedule space. To address this, we propose PowerFlow-DNN, a compiler-directed framework for end-to-end power-state orchestration in ultra-low-power accelerators. By constructing a rigorous problem formulation for deadline-constrained, real-time, periodic inference as a unified inter-layer power-scheduling problem, our framework enables automated discovery of energy-minimal power-state schedules that adhere to a deadline while accounting for end-to-end, inter-layer impacts. We evaluate the framework on a DNN accelerator VLSI implementation in TSMC 40nm technology. Across representative edge networks, we show that PowerFlow-DNN discovers near-optimal solutions under the discretized formulation and achieves energy within 0.68\% of the exact ILP oracle, reducing energy by up to 37\% compared to an aggressive baseline without power orchestration, while reasoning over a combinatorial schedule space of over $10^{160}$ possible power-state assignments, yet operating on a structured layered state graph that enables efficient optimization, achieving up to 2.14$\times$ solver speedup via lightweight pruning.

PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI Inference

Abstract

possible power-state assignments, yet operating on a structured layered state graph that enables efficient optimization, achieving up to 2.14

solver speedup via lightweight pruning.

Paper Structure (28 sections, 3 equations, 9 figures, 2 tables)

This paper contains 28 sections, 3 equations, 9 figures, 2 tables.

Introduction
Limitations of Conventional DVFS in Edge AI Accelerators
Energy Composition Varies Across Layers
Limitations of Latency-Balanced DVFS
Impact of Rail Scarcity Constraint
Granularity vs Orchestration Trade-off
Architecture and Framework Overview
Target Architecture Model
Power-State Abstraction
Scheduling Anchors
Compiler Workflow
Problem Formulation
System Model
Optimization Objective
Solution Approach
...and 13 more sections

Figures (9)

Figure 1: Estimated static and dynamic energy breakdown in TSMC 40 nm for the first three layers of SqueezeNet. The characteristics of these example layers show different needs in space (as seen in the different bars in the cluster), levels (power gating and DVFS), and time (across the three layers).
Figure 2: Energy-performance scatter for three SqueezeNet layers under independent DVFS of the compute, RRAM, and feeder domains from any combination of 0.9V to 1.2V; the red marker indicates the nominal operating point. See Section \ref{['sec-method']} for post-layout level methodology and scaling.
Figure 3: PowerFlow orchestration workflow. The compiler analyzes dataflow and constraints to derive candidate power states per layer (left), then jointly schedules them across layers to meet the inference deadline while minimizing energy (right).
Figure 4: 40 nm accelerator layout used for calibration. The chip operates up to 500 MHz, with the RRAM subsystem at 100 MHz. RRAM banks are power-gated at the macro level, while higher-level domain control is modeled architecturally.
Figure 5: Energy per inference interval versus target inference rate for SqueezeNet. We compare baseline, +gating, +greedy, +gating+greedy and the proposed orchestration with optimized rail selection (see Section \ref{['sec-eval-baseline']} for definitions).
...and 4 more figures

PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI Inference

Abstract

PowerFlow-DNN: Compiler-Directed Fine-Grained Power Orchestration for End-to-End Edge AI Inference

Authors

Abstract

Table of Contents

Figures (9)