Table of Contents
Fetching ...

Finch: Sparse and Structured Tensor Programming with Control Flow

Willow Ahrens, Teodoro Fields Collin, Radha Patel, Kyle Deeds, Changwan Hong, Saman Amarasinghe

TL;DR

Finch tackles fragmentation in structured tensor computation by introducing a language that co-optimizes control flow and data structure via looplets and a tensor lifecycle interface. It enables dimension-aware code generation through wrapperization, dimensionalization, concordization, and lifecycle automation, lowering out loops into efficient structure-aware code. The main contributions are the looplets abstraction, the tensor lifecycle interface, the four structural formats, and a compiler pipeline that yields speedups in $SpMV$, $SpGEMM$, graph analytics, and image morphology. The practical impact is a more expressive, high-productivity path for engineering high-performance kernels on structured tensors, with open-source tooling.

Abstract

From FORTRAN to NumPy, tensors have revolutionized how we express computation. However, tensors in these, and almost all prominent systems, can only handle dense rectilinear integer grids. Real world tensors often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Support for structured data is fragmented and incomplete. Existing frameworks limit the tensor structures and program control flow they support to better simplify the problem. In this work, we propose a new programming language, Finch, which supports both flexible control flow and diverse data structures. Finch facilitates a programming model which resolves the challenges of computing over structured tensors by combining control flow and data structures into a common representation where they can be co-optimized. Finch automatically specializes control flow to data so that performance engineers can focus on experimenting with many algorithms. Finch supports a familiar programming language of loops, statements, ifs, breaks, etc., over a wide variety of tensor structures, such as sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch reliably utilizes the key properties of structure, such as structural zeros, repeated values, or clustered non-zeros. We show that this leads to dramatic speedups in operations such as SpMV and SpGEMM, image processing, and graph analytics.

Finch: Sparse and Structured Tensor Programming with Control Flow

TL;DR

Finch tackles fragmentation in structured tensor computation by introducing a language that co-optimizes control flow and data structure via looplets and a tensor lifecycle interface. It enables dimension-aware code generation through wrapperization, dimensionalization, concordization, and lifecycle automation, lowering out loops into efficient structure-aware code. The main contributions are the looplets abstraction, the tensor lifecycle interface, the four structural formats, and a compiler pipeline that yields speedups in , , graph analytics, and image morphology. The practical impact is a more expressive, high-productivity path for engineering high-performance kernels on structured tensors, with open-source tooling.

Abstract

From FORTRAN to NumPy, tensors have revolutionized how we express computation. However, tensors in these, and almost all prominent systems, can only handle dense rectilinear integer grids. Real world tensors often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Support for structured data is fragmented and incomplete. Existing frameworks limit the tensor structures and program control flow they support to better simplify the problem. In this work, we propose a new programming language, Finch, which supports both flexible control flow and diverse data structures. Finch facilitates a programming model which resolves the challenges of computing over structured tensors by combining control flow and data structures into a common representation where they can be co-optimized. Finch automatically specializes control flow to data so that performance engineers can focus on experimenting with many algorithms. Finch supports a familiar programming language of loops, statements, ifs, breaks, etc., over a wide variety of tensor structures, such as sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch reliably utilizes the key properties of structure, such as structural zeros, repeated values, or clustered non-zeros. We show that this leads to dramatic speedups in operations such as SpMV and SpGEMM, image processing, and graph analytics.
Paper Structure (61 sections, 1 equation, 24 figures, 6 tables)

This paper contains 61 sections, 1 equation, 24 figures, 6 tables.

Figures (24)

  • Figure 1: A few examples of matrix structures arising in practice
  • Figure 2: The looplet language, as understood in a correct execution of a Finch program.
  • Figure 3: Levels in the fiber tree representation of a sparse matrix in CSC format, with a dense outer level and a sparse inner level. The element level holds the leaves of the tree.
  • Figure 4: All combinations of our 4 structural properties and the corresponding formats we have chosen to represent them. Not all combinations are relevant. Note that blocks and runs need not be considered together because we must store a run length for each run, so there isn't a significant storage benefit to combining them. Blocks and singletons only make sense in the context of sparsity.
  • Figure 5: Several examples of matrix structures represented using the level structures identified in Table \ref{['tab:TypesOfStructure']}. Comparing this figure to ahrens_looplets_2023, we see that a level-by-level structural decomposition is diagrammed together with the looplets.
  • ...and 19 more figures