Table of Contents
Fetching ...

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

Yijia Zhang, Zhihong Gou, Shijie Cao, Weigang Feng, Sicheng Zhang, Guohao Dai, Ningyi Xu

TL;DR

This work tackles the rising energy cost of deploying DNNs on GPUs by shifting kernel optimization from latency-only to energy-aware kernel generation. It introduces a fast, search-based compilation framework that combines a genetic-algorithm search with a learned energy cost model, and a dynamic online updating strategy to minimize on-device energy measurements. The energy model, trained with high-level kernel features using an XGBoost backbone and a weighted loss, enables rapid energy predictions and accelerates the search process, achieving up to $21.69\%$ energy reduction with latency comparable to strong baselines. The Results show consistent energy savings across MM, MV, and Conv operators on both A100 and RTX 4090 GPUs, along with substantial speedups in search time, indicating strong practical potential for energy-aware kernel optimization in large GPU clusters.

Abstract

Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate energy-efficient GPU kernels by incorporating energy efficiency into the search process. To accelerate the energy evaluation process, we develop an accurate energy cost model based on high-level kernel features. Furthermore, we introduce a dynamic updating strategy for the energy cost model, reducing the need for on-device energy measurements and accelerating the search process. Our evaluation demonstrates that the proposed approach can generate GPU kernels with up to 21.69% reduced energy consumption while maintaining low latency.

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

TL;DR

This work tackles the rising energy cost of deploying DNNs on GPUs by shifting kernel optimization from latency-only to energy-aware kernel generation. It introduces a fast, search-based compilation framework that combines a genetic-algorithm search with a learned energy cost model, and a dynamic online updating strategy to minimize on-device energy measurements. The energy model, trained with high-level kernel features using an XGBoost backbone and a weighted loss, enables rapid energy predictions and accelerates the search process, achieving up to energy reduction with latency comparable to strong baselines. The Results show consistent energy savings across MM, MV, and Conv operators on both A100 and RTX 4090 GPUs, along with substantial speedups in search time, indicating strong practical potential for energy-aware kernel optimization in large GPU clusters.

Abstract

Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate energy-efficient GPU kernels by incorporating energy efficiency into the search process. To accelerate the energy evaluation process, we develop an accurate energy cost model based on high-level kernel features. Furthermore, we introduce a dynamic updating strategy for the energy cost model, reducing the need for on-device energy measurements and accelerating the search process. Our evaluation demonstrates that the proposed approach can generate GPU kernels with up to 21.69% reduced energy consumption while maintaining low latency.

Paper Structure

This paper contains 28 sections, 1 equation, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Our search-based compilation for energy-efficient kernel generation.
  • Figure 2: The latency and energy consumption of one convolution operator from ResNet-50 generated by Ansor, running on one NVIDIA P100 GPU. The kernel generate by our method consumes less energy while maintaining similar latency with Ansor's.
  • Figure 3: The inverse correlation between latency and operating power of MatMul (M, N, K=1024, 1024, 1024) kernels generated by Ansor. Evaluation has been conducted on one NVIDIA A100 GPU.
  • Figure 4: The normalized predicted energy v.s. the normalized measured energy.
  • Figure 5: The time cost of NVML-only and costmodel-based searching.