Table of Contents
Fetching ...

Tempo: Compiled Dynamic Deep Learning with Symbolic Dependence Graphs

Pedro F. Silvestre, Peter Pietzuch

TL;DR

Tempo introduces recurrent tensors (RTs) with explicit temporal dimensions and symbolic indexing to express dynamic dependencies and shapes in DL programs. It builds symbolic dependence graphs (SDGs) from RTs to enable whole-program optimizations such as vectorization, tiling, and operator fusion, and schedules them with a polyhedral model that also orchestrates memory management. The implementation demonstrates up to 7× speedups in LLM decoding and up to 54× faster RL training, along with up to 16× lower peak memory, validating its potential for scalable dynamic DL workloads. By unifying eager execution with graph-based optimization, Tempo offers a practical path to high-performance dynamic DL on commodity GPUs with automated memory handling and algorithm-aware scheduling.

Abstract

Deep learning (DL) algorithms are often defined in terms of temporal relationships: a tensor at one timestep may depend on tensors from earlier or later timesteps. Such dynamic dependencies (and corresponding dynamic tensor shapes) are difficult to express and optimize: while eager DL systems support such dynamism, they cannot apply compiler-based optimizations; graph-based systems require static tensor shapes, which forces users to pad tensors or break-up programs into multiple static graphs. We describe Tempo, a new DL system that combines the dynamism of eager execution with the whole-program optimizations of graph-based compilation. Tempo achieves this through a declarative programming model with recurrent tensors, which include explicit temporal dimensions. Temporal dimensions can be indexed using symbolic expressions to express dynamic dependencies on past and future tensors. Based on this, Tempo constructs a symbolic dependence graph, which concisely encodes dynamic dependencies between operators, and applies whole-program optimizations, such as algebraic simplifications, vectorization, tiling, and fusion. By tiling dynamic dependencies into static-size blocks, Tempo can also reuse existing static code-generators. It then uses a polyhedral model to find a feasible execution schedule, which includes memory management operations. We show that Tempo achieves a 7$\times$ speedup over JAX for Llama-3.2-3B decoding; for reinforcement learning algorithms, Tempo achieves a 54$\times$ speedup, with 16$\times$ lower peak memory usage.

Tempo: Compiled Dynamic Deep Learning with Symbolic Dependence Graphs

TL;DR

Tempo introduces recurrent tensors (RTs) with explicit temporal dimensions and symbolic indexing to express dynamic dependencies and shapes in DL programs. It builds symbolic dependence graphs (SDGs) from RTs to enable whole-program optimizations such as vectorization, tiling, and operator fusion, and schedules them with a polyhedral model that also orchestrates memory management. The implementation demonstrates up to 7× speedups in LLM decoding and up to 54× faster RL training, along with up to 16× lower peak memory, validating its potential for scalable dynamic DL workloads. By unifying eager execution with graph-based optimization, Tempo offers a practical path to high-performance dynamic DL on commodity GPUs with automated memory handling and algorithm-aware scheduling.

Abstract

Deep learning (DL) algorithms are often defined in terms of temporal relationships: a tensor at one timestep may depend on tensors from earlier or later timesteps. Such dynamic dependencies (and corresponding dynamic tensor shapes) are difficult to express and optimize: while eager DL systems support such dynamism, they cannot apply compiler-based optimizations; graph-based systems require static tensor shapes, which forces users to pad tensors or break-up programs into multiple static graphs. We describe Tempo, a new DL system that combines the dynamism of eager execution with the whole-program optimizations of graph-based compilation. Tempo achieves this through a declarative programming model with recurrent tensors, which include explicit temporal dimensions. Temporal dimensions can be indexed using symbolic expressions to express dynamic dependencies on past and future tensors. Based on this, Tempo constructs a symbolic dependence graph, which concisely encodes dynamic dependencies between operators, and applies whole-program optimizations, such as algebraic simplifications, vectorization, tiling, and fusion. By tiling dynamic dependencies into static-size blocks, Tempo can also reuse existing static code-generators. It then uses a polyhedral model to find a feasible execution schedule, which includes memory management operations. We show that Tempo achieves a 7 speedup over JAX for Llama-3.2-3B decoding; for reinforcement learning algorithms, Tempo achieves a 54 speedup, with 16 lower peak memory usage.
Paper Structure (24 sections, 3 equations, 24 figures, 1 algorithm)

This paper contains 24 sections, 3 equations, 24 figures, 1 algorithm.

Figures (24)

  • Figure 1: Overview of how Tempo expresses, optimizes, schedules and manages the memory of dynamic DL algorithms
  • Figure 2: Dynamic dependencies (Tensor $y$ at time $t$ depends on (as indicated by the arrow direction) a dynamic range of $x$ values.)
  • Figure 3: Simplified Llama-3.2-3B architecture (Each decoding step depends on a dynamic range of cached key-value (K/V) pairs.)
  • Figure 4: Actor-learner architecture in RL systems (Decoupling acting and learning causes ➊ duplicate forward passes, ➋ serial acting and learning, and ➌ high peak memory usage.)
  • Figure 5: How a RT $x$ with domain $(t,)$ can be used in Tempo
  • ...and 19 more figures