Table of Contents
Fetching ...

Compressing Structured Tensor Algebra

Mahdi Ghorbani, Emilien Bauer, Tobias Grosser, Amir Shaikhha

TL;DR

DASTAC addresses the challenge of efficiently computing structured tensor algebra by propagating high-level structure into a densely packed data layout and low-level, structure-aware code. It combines StructTensor-based structure inference with a novel symbolic indexing compression and progressive MLIR-based code generation leveraging the polyhedral model. The approach delivers up to 1–2 orders of magnitude speedups over state-of-the-art sparse and structured tensor compilers while achieving substantially lower memory footprints, and it scales effectively on multi-core CPUs. This work suggests a practical path toward high-performance structured tensor computations on CPUs and future GPU targets by unifying dense and sparse optimization techniques through polyhedral and MLIR pipelines.

Abstract

Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra and efficient structure-aware algorithms provided by sparse tensor algebra. In this paper, we introduce DASTAC, a framework to propagate the tensors's captured high-level structure down to low-level code generation by incorporating techniques such as automatic data layout compression, polyhedral analysis, and affine code generation. Our methodology reduces memory footprint by automatically detecting the best data layout, heavily benefits from polyhedral optimizations, leverages further optimizations, and enables parallelization through MLIR. Through extensive experimentation, we show that DASTAC achieves 1 to 2 orders of magnitude speedup over TACO, a state-of-the-art sparse tensor compiler, and StructTensor, a state-of-the-art structured tensor algebra compiler, with a significantly lower memory footprint.

Compressing Structured Tensor Algebra

TL;DR

DASTAC addresses the challenge of efficiently computing structured tensor algebra by propagating high-level structure into a densely packed data layout and low-level, structure-aware code. It combines StructTensor-based structure inference with a novel symbolic indexing compression and progressive MLIR-based code generation leveraging the polyhedral model. The approach delivers up to 1–2 orders of magnitude speedups over state-of-the-art sparse and structured tensor compilers while achieving substantially lower memory footprints, and it scales effectively on multi-core CPUs. This work suggests a practical path toward high-performance structured tensor computations on CPUs and future GPU targets by unifying dense and sparse optimization techniques through polyhedral and MLIR pipelines.

Abstract

Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra and efficient structure-aware algorithms provided by sparse tensor algebra. In this paper, we introduce DASTAC, a framework to propagate the tensors's captured high-level structure down to low-level code generation by incorporating techniques such as automatic data layout compression, polyhedral analysis, and affine code generation. Our methodology reduces memory footprint by automatically detecting the best data layout, heavily benefits from polyhedral optimizations, leverages further optimizations, and enables parallelization through MLIR. Through extensive experimentation, we show that DASTAC achieves 1 to 2 orders of magnitude speedup over TACO, a state-of-the-art sparse tensor compiler, and StructTensor, a state-of-the-art structured tensor algebra compiler, with a significantly lower memory footprint.
Paper Structure (17 sections, 15 equations, 20 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 15 equations, 20 figures, 2 tables, 2 algorithms.

Figures (20)

  • Figure 1: Comparison of tensor processing frameworks. DASTAC is the first code generation framework that combines the algorithmic optimizations known from sparse tensor algebras with performance-optimized low-level code known from decades of tuning dense tensors.
  • Figure 2: DASTAC architecture overview.
  • Figure 3: The TTM running example where the first tensor has an upper half cube structure. $A_C$ represents its optimized structured computation.
  • Figure 4: The compressed kernel code only consists of cheap arithmetic operations, loops, and loads and stores on data. In particular, no index and offset values are loaded from memory such that all memory bandwidth is available for the actual compute workload
  • Figure 5: Illustration of the affine symbolic indexing method, ensuring a dense packing and memory compression.
  • ...and 15 more figures