Table of Contents
Fetching ...

Glow: Graph Lowering Compiler Techniques for Neural Networks

Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, Jack Montgomery, Bert Maher, Satish Nadathur, Jakob Olesen, Jongsoo Park, Artem Rakhov, Misha Smelyanskiy, Man Wang

TL;DR

Glow addresses the challenge of efficiently compiling neural networks for diverse hardware by introducing a multi-level, strongly-typed IR stack that separates high-level graph optimizations from low-level, memory-aware code generation. Through node lowering, predication, and a CPU-focused backend with operator stacking, Glow enables broad backend support with maintained performance. Quantization via profile-guided techniques and a small standard library in LLVM enhances efficiency on integer arithmetic. Evaluation shows Glow delivering substantial speedups over TensorFlow and competitive gains versus TVM on commodity CPUs, underscoring the practicality of its graph-lowering approach for heterogeneous hardware.

Abstract

This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is a pragmatic approach to compilation that enables the generation of highly optimized code for multiple targets. Glow lowers the traditional neural network dataflow graph into a two-phase strongly-typed intermediate representation. The high-level intermediate representation allows the optimizer to perform domain-specific optimizations. The lower-level instruction-based address-only intermediate representation allows the compiler to perform memory-related optimizations, such as instruction scheduling, static memory allocation and copy elimination. At the lowest level, the optimizer performs machine-specific code generation to take advantage of specialized hardware features. Glow features a lowering phase which enables the compiler to support a high number of input operators as well as a large number of hardware targets by eliminating the need to implement all operators on all targets. The lowering phase is designed to reduce the input space and allow new hardware backends to focus on a small number of linear algebra primitives.

Glow: Graph Lowering Compiler Techniques for Neural Networks

TL;DR

Glow addresses the challenge of efficiently compiling neural networks for diverse hardware by introducing a multi-level, strongly-typed IR stack that separates high-level graph optimizations from low-level, memory-aware code generation. Through node lowering, predication, and a CPU-focused backend with operator stacking, Glow enables broad backend support with maintained performance. Quantization via profile-guided techniques and a small standard library in LLVM enhances efficiency on integer arithmetic. Evaluation shows Glow delivering substantial speedups over TensorFlow and competitive gains versus TVM on commodity CPUs, underscoring the practicality of its graph-lowering approach for heterogeneous hardware.

Abstract

This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is a pragmatic approach to compilation that enables the generation of highly optimized code for multiple targets. Glow lowers the traditional neural network dataflow graph into a two-phase strongly-typed intermediate representation. The high-level intermediate representation allows the optimizer to perform domain-specific optimizations. The lower-level instruction-based address-only intermediate representation allows the compiler to perform memory-related optimizations, such as instruction scheduling, static memory allocation and copy elimination. At the lowest level, the optimizer performs machine-specific code generation to take advantage of specialized hardware features. Glow features a lowering phase which enables the compiler to support a high number of input operators as well as a large number of hardware targets by eliminating the need to implement all operators on all targets. The lowering phase is designed to reduce the input space and allow new hardware backends to focus on a small number of linear algebra primitives.

Paper Structure

This paper contains 26 sections, 11 figures.

Figures (11)

  • Figure 1: Compilers struggle to analyze and optimize this code when the two loops come from two different nodes in the dataflow graph.
  • Figure 2: A lowered compute graph in Glow's high-level IR, representing a regression of $A$, automatically differentiated by Glow.
  • Figure 3: Unoptimized low-level Glow IR.
  • Figure 4: Example class-gen for the Average Pool instruction.
  • Figure 5: A quantized subgraph from Resnet50.
  • ...and 6 more figures