Graph neural networks with configuration cross-attention for tensor compilers

Dmitrii Khizbullin; Eduardo Rocha de Andrade; Thanh Hau Nguyen; Matheus Pedroza Ferreira; David R. Pugh

Graph neural networks with configuration cross-attention for tensor compilers

Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh

TL;DR

TGraph is proposed, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers.

Abstract

With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $τ$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.

Graph neural networks with configuration cross-attention for tensor compilers

TL;DR

Abstract

across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO

emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.

Paper Structure (30 sections, 10 equations, 3 figures, 3 tables)

This paper contains 30 sections, 10 equations, 3 figures, 3 tables.

Introduction
Related work
TpuGraphs dataset and benchmark details
Contribution summary
Societal impact
TGraph runtime ranking architecture
Problem specification
Data pre-processing
Graph pruning
Configuration deduplication
Lossless configuration compression
Changing the pad value in node_feat
Data normalization, embedding and batching
Architecture details
Channel-wise self-attention
...and 15 more sections

Figures (3)

Figure 1: An example of how different tensor layout configurations affect the runtime of the computational (sub-)graph. Configuration 1 is faster than and, consequently, superior to configuration 2.
Figure 2: An example of node pruning. Nodes that are not connected to configurable nodes are removed (red nodes on the diagram). Two disconnected subgraphs are left after pruning.
Figure 3: Architecture diagram of TGraph. $n_{configs}$ is the number of configurations sampled into a batch. $n_{nodes}$ is the number of nodes in the sampled graph after pruning.

Graph neural networks with configuration cross-attention for tensor compilers

TL;DR

Abstract

Graph neural networks with configuration cross-attention for tensor compilers

Authors

TL;DR

Abstract

Table of Contents

Figures (3)