Table of Contents
Fetching ...

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

Lijuan Jiang, Xingjian Qian, Zhenxiang Ma, Zan Zong, Hengjie Li, Chao Yang, Jidong Zhai

TL;DR

FlexPipe introduces a programmable pipeline parallelism framework that combines a Python-embedded DSL with an automated scheduler and an auto-tuner to automatically explore and instantiate efficient pipeline schedules for diverse DNN architectures. It models scheduling as a Computation Schedule Space Representation (CSSR) and uses an actor-aware mechanism with computation-type and stage-traversal priorities to guide micro-batch execution. The system enables adding new operations, supports various stage placement strategies, and includes gradient-separation optimizations to reduce bubbles, achieving significant end-to-end speedups over Megatron-LM and competitive results against automated baselines. The approach demonstrates practical impact for both transformer and multimodal models by delivering flexible, scalable, and low-overhead schedule exploration and tuning.

Abstract

Pipeline parallelism is an essential distributed parallelism method. Increasingly complex and diverse DNN models necessitate meticulously customized pipeline schedules for performance. However, existing practices typically rely on predefined schedules, each with strengths, but fail to adapt automatically to the emerging model architectures. Exploring novel high-efficiency schedules is daunting due to the enormous and varying schedule space. Besides, manually implementing schedules can be challenging due to the onerous coding burdens and constantly changing needs. Unfortunately, existing frameworks have limitations in automated schedule exploration and lack flexibility and controllability. This paper presents FlexPipe, a programmable pipeline parallelism framework with enhanced productivity, programmability, debuggability, and ease of tuning. FlexPipe has two main components: a succinct domain-specific language (DSL) and an automated scheduler. FlexPipe enables automated schedule exploration for various parallel scenarios within a broad spectrum of schedule types at a small search cost. Besides, users can swiftly develop and customize schedules using the FlexPipe DSL, which embodies flexible controllability in the pipeline order of micro-batch computations over stages. It also provides convenient mechanisms to include new operations in schedules to meet changing demands. Our evaluation results demonstrate that FlexPipe achieves up to 2.28X performance speedup compared to the popular large-scale parallel framework Megtron-LM, and gains up to 1.49X performance speedup compared to the state-of-the-art automated pipeline parallelism framework.

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

TL;DR

FlexPipe introduces a programmable pipeline parallelism framework that combines a Python-embedded DSL with an automated scheduler and an auto-tuner to automatically explore and instantiate efficient pipeline schedules for diverse DNN architectures. It models scheduling as a Computation Schedule Space Representation (CSSR) and uses an actor-aware mechanism with computation-type and stage-traversal priorities to guide micro-batch execution. The system enables adding new operations, supports various stage placement strategies, and includes gradient-separation optimizations to reduce bubbles, achieving significant end-to-end speedups over Megatron-LM and competitive results against automated baselines. The approach demonstrates practical impact for both transformer and multimodal models by delivering flexible, scalable, and low-overhead schedule exploration and tuning.

Abstract

Pipeline parallelism is an essential distributed parallelism method. Increasingly complex and diverse DNN models necessitate meticulously customized pipeline schedules for performance. However, existing practices typically rely on predefined schedules, each with strengths, but fail to adapt automatically to the emerging model architectures. Exploring novel high-efficiency schedules is daunting due to the enormous and varying schedule space. Besides, manually implementing schedules can be challenging due to the onerous coding burdens and constantly changing needs. Unfortunately, existing frameworks have limitations in automated schedule exploration and lack flexibility and controllability. This paper presents FlexPipe, a programmable pipeline parallelism framework with enhanced productivity, programmability, debuggability, and ease of tuning. FlexPipe has two main components: a succinct domain-specific language (DSL) and an automated scheduler. FlexPipe enables automated schedule exploration for various parallel scenarios within a broad spectrum of schedule types at a small search cost. Besides, users can swiftly develop and customize schedules using the FlexPipe DSL, which embodies flexible controllability in the pipeline order of micro-batch computations over stages. It also provides convenient mechanisms to include new operations in schedules to meet changing demands. Our evaluation results demonstrate that FlexPipe achieves up to 2.28X performance speedup compared to the popular large-scale parallel framework Megtron-LM, and gains up to 1.49X performance speedup compared to the state-of-the-art automated pipeline parallelism framework.

Paper Structure

This paper contains 17 sections, 1 equation, 16 figures, 5 tables, 1 algorithm.

Figures (16)

  • Figure 1: Effective stage placement strategies. A0-a4 represent devices, and s1-s8 represent stages.
  • Figure 2: Imbalanced workloads of a 5B GPT model with varying vocabulary sizes.
  • Figure 3: DistMM-Pipe.
  • Figure 4: FlexPipe Overview.
  • Figure 5: 1F1B using the FlexPipe DSL.
  • ...and 11 more figures