Table of Contents
Fetching ...

THOR: A Generic Energy Estimation Approach for On-Device Training

Jiaru Zhang, Zesong Wang, Hao Wang, Tao Song, Huai-an Su, Rui Chen, Yang Hua, Xiangwei Zhou, Ruhui Ma, Miao Pan, Haibing Guan

TL;DR

THOR tackles the challenge of accurately estimating on-device DNN training energy across heterogeneous devices by introducing a layer-wise energy additivity hypothesis and learning per-layer costs with Gaussian Processes. The method profiles 1-, 2-, and 3-layer variants to elicit layer costs via subtractivity, fits GP models with a Matérn kernel (nu = 2.5) under an active-learning strategy, and estimates end-to-end energy by summing per-layer predictions. Empirical results across five devices and diverse models show up to a 30% improvement in $MAPE$ over FLOP-based baselines, and THOR enables energy-aware pruning that reduces total energy by up to 50% while preserving performance. The approach is generic, integrates with standard training frameworks, and supports practical use in energy-constrained devices, with code to be released upon acceptance.

Abstract

Battery-powered mobile devices (e.g., smartphones, AR/VR glasses, and various IoT devices) are increasingly being used for AI training due to their growing computational power and easy access to valuable, diverse, and real-time data. On-device training is highly energy-intensive, making accurate energy consumption estimation crucial for effective job scheduling and sustainable AI. However, the heterogeneity of devices and the complexity of models challenge the accuracy and generalizability of existing estimation methods. This paper proposes THOR, a generic approach for energy consumption estimation in deep neural network (DNN) training. First, we examine the layer-wise energy additivity property of DNNs and strategically partition the entire model into layers for fine-grained energy consumption profiling. Then, we fit Gaussian Process (GP) models to learn from layer-wise energy consumption measurements and estimate a DNN's overall energy consumption based on its layer-wise energy additivity property. We conduct extensive experiments with various types of models across different real-world platforms. The results demonstrate that THOR has effectively reduced the Mean Absolute Percentage Error (MAPE) by up to 30%. Moreover, THOR is applied in guiding energy-aware pruning, successfully reducing energy consumption by 50%, thereby further demonstrating its generality and potential.

THOR: A Generic Energy Estimation Approach for On-Device Training

TL;DR

THOR tackles the challenge of accurately estimating on-device DNN training energy across heterogeneous devices by introducing a layer-wise energy additivity hypothesis and learning per-layer costs with Gaussian Processes. The method profiles 1-, 2-, and 3-layer variants to elicit layer costs via subtractivity, fits GP models with a Matérn kernel (nu = 2.5) under an active-learning strategy, and estimates end-to-end energy by summing per-layer predictions. Empirical results across five devices and diverse models show up to a 30% improvement in over FLOP-based baselines, and THOR enables energy-aware pruning that reduces total energy by up to 50% while preserving performance. The approach is generic, integrates with standard training frameworks, and supports practical use in energy-constrained devices, with code to be released upon acceptance.

Abstract

Battery-powered mobile devices (e.g., smartphones, AR/VR glasses, and various IoT devices) are increasingly being used for AI training due to their growing computational power and easy access to valuable, diverse, and real-time data. On-device training is highly energy-intensive, making accurate energy consumption estimation crucial for effective job scheduling and sustainable AI. However, the heterogeneity of devices and the complexity of models challenge the accuracy and generalizability of existing estimation methods. This paper proposes THOR, a generic approach for energy consumption estimation in deep neural network (DNN) training. First, we examine the layer-wise energy additivity property of DNNs and strategically partition the entire model into layers for fine-grained energy consumption profiling. Then, we fit Gaussian Process (GP) models to learn from layer-wise energy consumption measurements and estimate a DNN's overall energy consumption based on its layer-wise energy additivity property. We conduct extensive experiments with various types of models across different real-world platforms. The results demonstrate that THOR has effectively reduced the Mean Absolute Percentage Error (MAPE) by up to 30%. Moreover, THOR is applied in guiding energy-aware pruning, successfully reducing energy consumption by 50%, thereby further demonstrating its generality and potential.

Paper Structure

This paper contains 25 sections, 8 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Illustration of the energy estimation process on a 5-layer CNN. In: $a$ and Out: $b$ indicate the input channel $a$ and output channel $b$ for convolutional layers.
  • Figure 2: Energy consumption from NeuralPower estimation and from observation for a CNN.
  • Figure 3: An overview of THOR.
  • Figure 4: GP after 4 and 5 steps for FC layer on OPPO taking 500 batches of (10, input channel, 28, 28) input.
  • Figure 5: Energy consumption of a FC layer taking 500 batches of (4, input channel, 50, 50) input on Xavier.
  • ...and 11 more figures