Table of Contents
Fetching ...

The Continuous Tensor Abstraction: Where Indices are Real

Jaeyeon Won, Willow Ahrens, Teodoro Fields Collin, Joel S. Emer, Saman Amarasinghe

TL;DR

The paper addresses the limitation of traditional tensor programming to integer indices, which hampers modeling real-valued domains. It proposes the continuous tensor abstraction with real-valued indices and a piecewise-constant tensor model, plus reductions for continuous iteration spaces and code-generation extensions to fibertree/Looplets. Key contributions include extending the tensor model to $x,y \in \mathbb{R}$, introducing a new memory format, reduction operators for continuous spaces, and a compiler-based pipeline that achieves substantial LoC reductions (e.g., ~18x, ~62x, ~101x) and speedups (e.g., $9.20\times$, $1.22\times$, $1.69\times$) across domains like bioinformatics, geospatial queries, 3D point clouds, and NeRF. Overall, the work unifies disparate domain-specific codes under a universal continuous-tensor framework and demonstrates practical performance competitive with hand-optimized implementations.

Abstract

This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (for example, A[3.14]). It also presents continuous tensor algebra expressions, such as C(x,y) = A(x,y) * B(x,y), where indices are defined over a continuous domain. This work expands the traditional tensor model to include continuous tensors. Our implementation supports piecewise-constant tensors, enabling infinite domains to be processed in finite time. We also introduce a new tensor format for efficient storage and a code generation technique for automatic kernel generation. For the first time, our abstraction expresses domains like computational geometry and computer graphics in the language of tensor programming. Our approach demonstrates competitive or better performance than hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries on a CPU, our compiler-based implementation achieves an average speedup of 9.20x on 2D radius search with approximately 60x fewer lines of code (LoC), 1.22x on genomic interval overlapping queries (with approximately 18x LoC saving), and 1.69x on trilinear interpolation in Neural Radiance Field (with approximately 6x LoC saving).

The Continuous Tensor Abstraction: Where Indices are Real

TL;DR

The paper addresses the limitation of traditional tensor programming to integer indices, which hampers modeling real-valued domains. It proposes the continuous tensor abstraction with real-valued indices and a piecewise-constant tensor model, plus reductions for continuous iteration spaces and code-generation extensions to fibertree/Looplets. Key contributions include extending the tensor model to , introducing a new memory format, reduction operators for continuous spaces, and a compiler-based pipeline that achieves substantial LoC reductions (e.g., ~18x, ~62x, ~101x) and speedups (e.g., , , ) across domains like bioinformatics, geospatial queries, 3D point clouds, and NeRF. Overall, the work unifies disparate domain-specific codes under a universal continuous-tensor framework and demonstrates practical performance competitive with hand-optimized implementations.

Abstract

This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (for example, A[3.14]). It also presents continuous tensor algebra expressions, such as C(x,y) = A(x,y) * B(x,y), where indices are defined over a continuous domain. This work expands the traditional tensor model to include continuous tensors. Our implementation supports piecewise-constant tensors, enabling infinite domains to be processed in finite time. We also introduce a new tensor format for efficient storage and a code generation technique for automatic kernel generation. For the first time, our abstraction expresses domains like computational geometry and computer graphics in the language of tensor programming. Our approach demonstrates competitive or better performance than hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries on a CPU, our compiler-based implementation achieves an average speedup of 9.20x on 2D radius search with approximately 60x fewer lines of code (LoC), 1.22x on genomic interval overlapping queries (with approximately 18x LoC saving), and 1.69x on trilinear interpolation in Neural Radiance Field (with approximately 6x LoC saving).
Paper Structure (2 sections, 1 figure)

This paper contains 2 sections, 1 figure.

Table of Contents

  1. Introduction
  2. Motivation

Figures (1)

  • Figure :