Table of Contents
Fetching ...

Galley: Modern Query Optimization for Sparse Tensor Programs

Kyle Deeds, Willow Ahrens, Magda Balazinska, Dan Suciu

TL;DR

Galley addresses the challenge of optimizing sparse tensor programs by introducing a cost-based lowering pipeline that converts declarative tensor programs into efficient sparse tensor compiler kernels. It decomposes programs into aggregation steps with a generalized FAQ-based logical optimizer, then performs a physical optimization to select loop orders, tensor formats, and merge strategies guided by sparsity statistics, enabling automatic, broad-spectrum optimization. The approach demonstrates substantial performance gains across workloads such as machine-learning-over-joins, sparse linear algebra, subgraph counting, and BFS, while keeping optimization overhead modest and adapting to input sparsity. By integrating with the PyData/Sparse ecosystem and the Finch STC, Galley provides a practical, scalable path toward efficient sparse-tensor analytics in real-world data pipelines.

Abstract

The tensor programming abstraction is a foundational paradigm which allows users to write high performance programs via a high-level imperative interface. Recent work on sparse tensor compilers has extended this paradigm to sparse tensors (i.e., tensors where most entries are not explicitly represented). With these systems, users define the semantics of the program and the algorithmic decisions in a concise language that can be compiled to efficient low-level code. However, these systems still require users to make complex decisions about program structure and memory layouts to write efficient programs. This work presents Galley, a system for declarative tensor programming that allows users to write efficient tensor programs without making complex algorithmic decisions. Galley is the first system to perform cost based lowering of sparse tensor algebra to the imperative language of sparse tensor compilers, and the first to optimize arbitrary operators beyond sum and product. First, it decomposes the input program into a sequence of aggregation steps through a novel extension of the FAQ framework. Second, Galley optimizes and converts each aggregation step to a concrete program, which is compiled and executed with a sparse tensor compiler. We show that Galley produces programs that are 1-300x faster than competing methods for machine learning over joins and 5-20x faster than a state-of-the-art relational database for subgraph counting workloads with a minimal optimization overhead.

Galley: Modern Query Optimization for Sparse Tensor Programs

TL;DR

Galley addresses the challenge of optimizing sparse tensor programs by introducing a cost-based lowering pipeline that converts declarative tensor programs into efficient sparse tensor compiler kernels. It decomposes programs into aggregation steps with a generalized FAQ-based logical optimizer, then performs a physical optimization to select loop orders, tensor formats, and merge strategies guided by sparsity statistics, enabling automatic, broad-spectrum optimization. The approach demonstrates substantial performance gains across workloads such as machine-learning-over-joins, sparse linear algebra, subgraph counting, and BFS, while keeping optimization overhead modest and adapting to input sparsity. By integrating with the PyData/Sparse ecosystem and the Finch STC, Galley provides a practical, scalable path toward efficient sparse-tensor analytics in real-world data pipelines.

Abstract

The tensor programming abstraction is a foundational paradigm which allows users to write high performance programs via a high-level imperative interface. Recent work on sparse tensor compilers has extended this paradigm to sparse tensors (i.e., tensors where most entries are not explicitly represented). With these systems, users define the semantics of the program and the algorithmic decisions in a concise language that can be compiled to efficient low-level code. However, these systems still require users to make complex decisions about program structure and memory layouts to write efficient programs. This work presents Galley, a system for declarative tensor programming that allows users to write efficient tensor programs without making complex algorithmic decisions. Galley is the first system to perform cost based lowering of sparse tensor algebra to the imperative language of sparse tensor compilers, and the first to optimize arbitrary operators beyond sum and product. First, it decomposes the input program into a sequence of aggregation steps through a novel extension of the FAQ framework. Second, Galley optimizes and converts each aggregation step to a concrete program, which is compiled and executed with a sparse tensor compiler. We show that Galley produces programs that are 1-300x faster than competing methods for machine learning over joins and 5-20x faster than a state-of-the-art relational database for subgraph counting workloads with a minimal optimization overhead.
Paper Structure (41 sections, 23 equations, 9 figures, 2 tables)

This paper contains 41 sections, 23 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Logistic regression implemented in the language of a sparse tensor compiler.
  • Figure 2: Galley overview.
  • Figure 3: Fibertree format abstraction.
  • Figure 4: Query plan dialects.
  • Figure 5: Annotated expression tree for logistic regression over joins $\sigma(\sum_{jpc}S_{ipc}(P_{pj}\theta_j + C_{cj}\theta_j))$
  • ...and 4 more figures

Theorems & Definitions (8)

  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Example 5
  • Example 6
  • Example 7
  • Example 8