Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

Yangjie Zhou; Honglin Zhu; Qian Qiu; Weihao Cui; Zihan Liu; Cong Guo; Siyuan Feng; Jintao Meng; Haidong Lan; Jingwen Leng; Wenxi Zhu; Minwen Deng

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Wenxi Zhu, Minwen Deng

TL;DR

Vortex is a hardware-driven and sample-free compiler tailored for dynamic-shape tensor programs that features a unique bidirectional compilation workflow, combining top-down abstraction for aligning tensor program execution with hardware hierarchies and bottom-up kernel construction to narrow the search space, enabling Vortex to achieve remarkable efficiency.

Abstract

Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely heavily on predefined samples to guide the compilation process, which restricts their adaptability and efficiency. These sample-driven methods struggle to efficiently manage the diverse and unpredictable shapes encountered in real-world scenarios, often resulting in suboptimal performance. To tackle these issues, we introduce Vortex, a hardware-driven and sample-free compiler tailored for dynamic-shape tensor programs. Vortex capitalizes on detailed hardware information and hierarchizes the strategy space to facilitate high-performance code generation without relying on runtime shape samples. It features a unique bidirectional compilation workflow, combining top-down abstraction for aligning tensor program execution with hardware hierarchies and bottom-up kernel construction to narrow the search space, enabling Vortex to achieve remarkable efficiency. Comprehensive evaluations confirm that Vortex reduces compilation time by $176\times$ compared to the existing dynamic-shape compiler. Additionally, it substantially outperforms existing vendor-provided libraries and dynamic-shape compilers on both CPU and GPU platforms, delivering speedups of $2.53\times$ and $3.01\times$, respectively.

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

TL;DR

Abstract

compared to the existing dynamic-shape compiler. Additionally, it substantially outperforms existing vendor-provided libraries and dynamic-shape compilers on both CPU and GPU platforms, delivering speedups of

and

, respectively.

Paper Structure (37 sections, 4 equations, 16 figures, 7 tables, 2 algorithms)

This paper contains 37 sections, 4 equations, 16 figures, 7 tables, 2 algorithms.

Introduction
Background and Motivation
Dynamic-shape Tensor Program
Limitations of Sample-Driven Approach
Hardware-Driven Approach: Opportunities and Challenges
Summary
Overview of $Vortex$
Key Idea.
Optimization Flow.
Summary.
Strategy Space Hierarchization in $Vortex$
Top-Down Recursive Notation
Unified Abstraction Design
Detailed Designs at Each Level
Bottom-up Hardware-aware Candidates Generator
...and 22 more sections

Figures (16)

Figure 1: Comparison of $Vortex$ with existing methods.
Figure 2: Existing sample-driven compilation workflow.
Figure 3: Comparing DietCode and cuBLAS over various sequence lengths on A100 GPU. 'DietCode-I' and 'DietCode-O' represent DietCode's dynamic input configurations inside and outside the tuning sample list, respectively.
Figure 4: CPU/GPU Diagram.
Figure 5: GEMM performance across different hardware resource usages on 8255c CPU and A100 GPU. Legend indicates corresponding GEMM parameters M, N, and K.
...and 11 more figures

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

TL;DR

Abstract

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

Authors

TL;DR

Abstract

Table of Contents

Figures (16)