A Structure-Aware Framework for Learning Device Placements on Computation Graphs

Shukai Duan; Heng Ping; Nikos Kanakaris; Xiongye Xiao; Panagiotis Kyriakis; Nesreen K. Ahmed; Peiyu Zhang; Guixiang Ma; Mihai Capota; Shahin Nazarian; Theodore L. Willke; Paul Bogdan

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

Shukai Duan, Heng Ping, Nikos Kanakaris, Xiongye Xiao, Panagiotis Kyriakis, Nesreen K. Ahmed, Peiyu Zhang, Guixiang Ma, Mihai Capota, Shahin Nazarian, Theodore L. Willke, Paul Bogdan

TL;DR

This work bridges the gap between encoder-placer and grouper-placer techniques and proposes a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit, facilitates end-to-end training and takes into account the DAG nature of the computation graphs.

Abstract

Computation graphs are Directed Acyclic Graphs (DAGs) where the nodes correspond to mathematical operations and are used widely as abstractions in optimizations of neural networks. The device placement problem aims to identify optimal allocations of those nodes to a set of (potentially heterogeneous) devices. Existing approaches rely on two types of architectures known as grouper-placer and encoder-placer, respectively. In this work, we bridge the gap between encoder-placer and grouper-placer techniques and propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. It facilitates end-to-end training and takes into account the DAG nature of the computation graphs. We also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and jointed, personalized graph partitioning, using an unspecified number of groups. To train the entire framework, we use reinforcement learning using the execution time of the placement as a reward. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to 58.2% over CPU execution and by up to 60.24% compared to other commonly used baselines.

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

TL;DR

Abstract

Paper Structure (29 sections, 18 equations, 2 figures, 6 tables, 2 algorithms)

This paper contains 29 sections, 18 equations, 2 figures, 6 tables, 2 algorithms.

Introduction
Proposed framework
Problem Formulation
Graph construction
Feature extraction
Learning embedding and groups jointly
Reinforcenment learning for node-based device assignment
Experiments
Benchmarks
Setup
Baseline comparison
Ablation studies
Downstream Model Performance and Runtime Complexity
Conclusion
Code availability
...and 14 more sections

Figures (2)

Figure 1: Overview of the proposed framework, HSDAG. Graph construction. We first convert a neural network model $c$ into a computation graph $G$, $repr:c \rightarrow G$. Feature extraction. Then, we calculate the initial feature matrix $\mathbf{X}^{(0)}$ capturing local and global connectivity information, node-aware features, information about the order of the nodes as well as features from fractal analysis. Learning embeddings and groups jointly. We further enrich node features $X^{(0)}$ using a $\text{GNN}: G \rightarrow \mathbf{Z}$ model and learn how to pool a graph $G$ jointly using a graph parsing network. In that way, we bridge the gap between grouper-placer and encoder-placer methods for device assignment. Device placement. A learnable MLP model classifies the nodes $V'$ of the coarsened graph $G'=(V', E')$ to the available devices $\mathcal{D}$. Heterogeneous execution. We map the device placement of $V'$ to $V$ based on the node assignment matrix $\mathcal{X}$ and apply the placement of all the operations into the execution environment to measure the execution time with the corresponding reward. End-to-end parameter update. We update our policy $\pi$ parameters $\theta$, i.e. the combination of GNN and MLP, based on the reward and renew the node feature matrix $\mathbf{Z}$ with the current cluster information. The entire framework supports end-to-end parameter updates and training.
Figure 2: The computation graph of each of the benchmark models before and after the graph partitioning and pooling.

Theorems & Definitions (2)

Definition 2.1
Definition 2.2

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

TL;DR

Abstract

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (2)