ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

Dengke Han; Meng Wu; Runzhen Xue; Mingyu Yan; Xiaochun Ye; Dongrui Fan

ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

Dengke Han, Meng Wu, Runzhen Xue, Mingyu Yan, Xiaochun Ye, Dongrui Fan

TL;DR

This paper tackles the high computational and memory cost of attention-based HGNN inference on heterogeneous graphs by exploiting attention disparity to prune neighbors. It introduces a min-heap–based runtime pruning mechanism and an operation-fusion execution flow to overlap pruning with computation, then implements these ideas in a dedicated accelerator, ADE-HGNN, with a unified computing unit and specialized pruner hardware. Empirical results show substantial improvements over GPUs—averages of $28.21\times$ speedup over T4 and $7.98\times$ over A100—with only $0.11\%-1.47\%$ accuracy loss and dramatic energy reductions (down to ~1.97% of T4 and 5.37% of A100). The work demonstrates the practicality and effectiveness of hardware-software co-design for HGNNs by leveraging attention disparity and inter-stage fusion to achieve scalable, energy-efficient HGNN inference.

Abstract

Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention disparity from source vertices towards a common target vertex unveils an opportunity to boost the model execution performance by pruning unimportant source vertices during neighbor aggregation. In this study, we commence with a quantitative analysis of the attention disparity in HGNN models, where the importance of different source vertices varies for the same target vertex. To fully exploit this finding for inference acceleration, we propose a runtime pruning method based on min-heap and map it to a dedicated hardware pruner to discard unimportant vertices. Given that the pruning overhead itself is non-negligible and cannot be amortized by conventional staged execution paradigm, an operation-fusion execution fow of HGNNs is introduced to overlap the pruning overhead while harnessing inter-stage parallelism. Finally, we present the design of a novel HGNN accelerator, ADE-HGNN, tailored to support the proposed execution framework. Our experimental results demonstrate that ADE-HGNN achieves an average performance improvement of 28.21x over the NVIDIA GPU T4 platform and 7.98x over the advanced GPU A100, with the inference accuracy loss kept within a negligible range of 0.11%~1.47%. Furthermore, ADE-HGNN significantly reduces energy consumption to 1.97% and 5.37% of the two platforms, respectively.

ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

TL;DR

speedup over T4 and

over A100—with only

accuracy loss and dramatic energy reductions (down to ~1.97% of T4 and 5.37% of A100). The work demonstrates the practicality and effectiveness of hardware-software co-design for HGNNs by leveraging attention disparity and inter-stage fusion to achieve scalable, energy-efficient HGNN inference.

Abstract

Paper Structure (26 sections, 2 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 2 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Background
Heterogeneous Graph and Semantic Graph
Heterogeneous Graph Neural Networks
Motivation
Attention Disparity
Challenge to Exploit the Opportunity
Optimized HGNN Execution Flow
Decomposition of Attention Computation
Neighbor Pruning Method Based on Min-heap
Parallel Execution with Operation Fusion
Architecture Design
Hardware Components
Design of Pruner
Experimental Results
...and 11 more sections

Figures (9)

Figure 1: An example of HetGs and execution process of HGNN models.
Figure 2: The attention disparity: (a) Varying attention importance; (b) Average ratio of the accumulated attention importance of the top 20% neighbors.
Figure 3: The ratio of pruning time on GPU and CPU to inference time on GPU.
Figure 4: Illustration of operation fusion: (a) A toy graph example; (b) Staged execution with neighbor pruning; (c) Parallel execution with operation fusion.
Figure 5: Overall architecture of ADE-HGNN.
...and 4 more figures

ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

TL;DR

Abstract

ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)