Finch: Sparse and Structured Tensor Programming with Control Flow
Willow Ahrens, Teodoro Fields Collin, Radha Patel, Kyle Deeds, Changwan Hong, Saman Amarasinghe
TL;DR
Finch tackles fragmentation in structured tensor computation by introducing a language that co-optimizes control flow and data structure via looplets and a tensor lifecycle interface. It enables dimension-aware code generation through wrapperization, dimensionalization, concordization, and lifecycle automation, lowering out loops into efficient structure-aware code. The main contributions are the looplets abstraction, the tensor lifecycle interface, the four structural formats, and a compiler pipeline that yields speedups in $SpMV$, $SpGEMM$, graph analytics, and image morphology. The practical impact is a more expressive, high-productivity path for engineering high-performance kernels on structured tensors, with open-source tooling.
Abstract
From FORTRAN to NumPy, tensors have revolutionized how we express computation. However, tensors in these, and almost all prominent systems, can only handle dense rectilinear integer grids. Real world tensors often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Support for structured data is fragmented and incomplete. Existing frameworks limit the tensor structures and program control flow they support to better simplify the problem. In this work, we propose a new programming language, Finch, which supports both flexible control flow and diverse data structures. Finch facilitates a programming model which resolves the challenges of computing over structured tensors by combining control flow and data structures into a common representation where they can be co-optimized. Finch automatically specializes control flow to data so that performance engineers can focus on experimenting with many algorithms. Finch supports a familiar programming language of loops, statements, ifs, breaks, etc., over a wide variety of tensor structures, such as sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch reliably utilizes the key properties of structure, such as structural zeros, repeated values, or clustered non-zeros. We show that this leads to dramatic speedups in operations such as SpMV and SpGEMM, image processing, and graph analytics.
