Applying Graph Explanation to Operator Fusion

Keith G. Mills; Muhammad Fetrat Qharabagh; Weichen Qiu; Fred X. Han; Mohammad Salameh; Wei Lu; Shangling Jui; Di Niu

Applying Graph Explanation to Operator Fusion

Keith G. Mills, Muhammad Fetrat Qharabagh, Weichen Qiu, Fred X. Han, Mohammad Salameh, Wei Lu, Shangling Jui, Di Niu

TL;DR

This work addresses the challenge of optimizing Layer Fusion (LF) for DNN inference under fixed on-chip buffer constraints by introducing Graph Explanation Techniques (GET) to diagnose invalid fusion groups. A binary GNN-based validity predictor, combined with explanations from GNNE, PG, or RG, guides a greedy tree-based splitting algorithm to recursively partition invalid fusion groups with minimal DRAM access. The method is demonstrated across LBDF and BRR fusion schemes, multiple CNNs (e.g., EfficientNet, ResNet, MobileNet, SqueezeNet) and a semantic segmentation model, using several search algorithms (Random Search, Local Search, NSGA-II) with memoization and pruning of unfusable ops. Experiments show substantial DRAM access reductions, notably over 20% on EfficientNet-B3 and meaningful improvements on large networks, validating GET-driven splitting as a robust enhancement to LF optimization with practical inference benefits.

Abstract

Layer fusion techniques are critical to improving the inference efficiency of deep neural networks (DNN) for deployment. Fusion aims to lower inference costs by reducing data transactions between an accelerator's on-chip buffer and DRAM. This is accomplished by grouped execution of multiple operations like convolution and activations together into single execution units - fusion groups. However, on-chip buffer capacity limits fusion group size and optimizing fusion on whole DNNs requires partitioning into multiple fusion groups. Finding the optimal groups is a complex problem where the presence of invalid solutions hampers traditional search algorithms and demands robust approaches. In this paper we incorporate Explainable AI, specifically Graph Explanation Techniques (GET), into layer fusion. Given an invalid fusion group, we identify the operations most responsible for group invalidity, then use this knowledge to recursively split the original fusion group via a greedy tree-based algorithm to minimize DRAM access. We pair our scheme with common algorithms and optimize DNNs on two types of layer fusion: Line-Buffer Depth First (LBDF) and Branch Requirement Reduction (BRR). Experiments demonstrate the efficacy of our scheme on several popular and classical convolutional neural networks like ResNets and MobileNets. Our scheme achieves over 20% DRAM Access reduction on EfficientNet-B3.

Applying Graph Explanation to Operator Fusion

TL;DR

Abstract

Paper Structure (16 sections, 4 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 4 equations, 5 figures, 5 tables, 1 algorithm.

Introduction
Background and Related Work
Networks as Graphs
Subgraph Explanations
Methodology
Cost Model Granularity
Fusion Group Explanation
Skip-Connections
Greedy Tree-based Selection
Experimental Results
Graph Neural Networks to Predict Validity
Scope of Layer Fusion Optimization
Improving Search on Large Networks
Comparisons Across Fusion Methods
Additional Figures and Discussion
...and 1 more sections

Figures (5)

Figure 1: LBDF on a fusion group consisting of two $3 \times 3$ convolution kernels in sequence. Area bounded by the red square denotes the input data required to compute the current output. '-' denotes the next data entries to be released from the on-chip buffer. '+' denotes the next data point to be loaded from DRAM (input map) or computed (intermediate map). Best viewed in color.
Figure 2: A high-level overview of our scheme. Best viewed in color. (a): A search algorithm generates a partition plan, and an analytical validity checker determines the feasibility of each fusion group in the plan. (b): We use a GNN and GETs to find a subgraph explanation for each invalid fusion group. (c): We consider how to split the fusion group at every solution edge contained within the subgraph explanation. Note how the explanation contains a skip-connection, meaning we must cut at least 2 edges. (d): We use a greedy tree-based algorithm to consider all possible solutions which split the fusion group and sort them based whether the number of new fusion groups. In the optimal case (green arrow), both new fusion groups are valid. If one (blue arrow) or both (red arrow) of the fusion groups are invalid, we use the recursive algorithm to repeat the process from step (a) for each invalid fusion group.
Figure 3: Plotting the percentage of valid fusion groups for each CNN type as we double the on-chip buffer size from a minimum of 128KB to a maximum of 2048KB. Best viewed in color. Note that we do not plot BRR curves for MobileNetV3 and EfficientNet or use them to train our GNNs as that form of LF has compatibility issues with the Squeeze-and-Excite hu2018squeeze module present in those networks.
Figure 4: Explanations of an invalid fusion group from EfficientNet according to GNNE, PG and RG. Solid lines indicate an edge was selected via by a given GET.
Figure 5: Partition plan budget vs. best DRAM access cost. We compare DRAM performance across gradual increases in the plan budget. Best viewed in color.

Applying Graph Explanation to Operator Fusion

TL;DR

Abstract

Applying Graph Explanation to Operator Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (5)