SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

Wei Niu; Gagan Agrawal; Bin Ren

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

Wei Niu, Gagan Agrawal, Bin Ren

TL;DR

This paper presents SoD2, a comprehensive framework for optimizing Dynamic DNNs, and evaluates the framework on 10 emerging Dynamic DNNs and compares it against several existing systems, demonstrating both reductions in execution latency and memory requirements.

Abstract

Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD$^2$, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains. Our evaluation results show that SoD$^2$ runs up to $3.9\times$ faster than these systems while saving up to $88\%$ peak memory consumption.

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

TL;DR

Abstract

, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains. Our evaluation results show that SoD

runs up to

faster than these systems while saving up to

peak memory consumption.

Paper Structure (22 sections, 1 equation, 13 figures, 7 tables, 1 algorithm)

This paper contains 22 sections, 1 equation, 13 figures, 7 tables, 1 algorithm.

Introduction
Existing Frameworks and Limitations
Operator Classification based on Dynamism
Background and Notation.
Design of SoD$^2$
Pre-Deployment Data-Flow Analysis
Formal Definition of Operator Rank and Dimension Propagation (RDP).
RDP Solution.
Operator Fusion for Dynamic DNN based on RDP
Static Execution Planning based on RDP
Other Optimizations
Memory Allocation Plan
RDP-based Multi-Version Code Generation
Evaluation
Evaluation Setup
...and 7 more sections

Figures (13)

Figure 1: Different degrees of dynamism. Each node is a DNN operator. Yellow, blue, red, and purple mean Input Shape Determined Output, Input Shape Determined Output Shape, Input Shape & Value Determined Output Shape, and Execution Determined Output, respectively. In (d), Switch's execution path is decided dynamically during runtime and red dot edges represent both the computation dependency and control flow.
Figure 2: Domain of RDP dataflow analysis. It includes known, symbolic, and operation-inferred constants that form a lattice.
Figure 3: Examples of forward and backward transfer. Each node is an operator. Yellow, blue, and red mean Input Shape Determined Output, Input Shape Determined Output Shape, and Input Shape & Value Determined Output Shape, respectively. Ids (e.g., ①) indicate the location where transfer functions apply and their applying orders for a forward transfer (a backward transfer reverses this order). S and V equations map values in the RDP domain to the shape and value of each tensor, in which, F denotes the transfer function. fs and bs of F denote forward and backward, and F's subscript is a short form of its type (e.g., ISDOS means Input Shape Determined Output Shape).
Figure 4: Operator fusion with dynamic shapes. The top code snippet shows that fusion is not feasible because of broadcasting broadcasting. Specifically, Add requires A's indices $I'$, $J'$, and $K'$ to be either 1 or $I$, $J$, and $K$, resulting in 8 fusion scenarios. With RDP, such fusion is feasible (shown in the below code snippet). This fusion significantly reduces intermediate result materialization requirements.
Figure 5: Memory reduction of different optimizations on CPU. Over the baseline w/o any RDP-enabled optimization (No opt.)
...and 8 more figures

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

TL;DR

Abstract

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

Authors

TL;DR

Abstract

Table of Contents

Figures (13)