TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

Nandeeka Nayak; Toluwanimi O. Odemuyiwa; Shubham Ugare; Christopher W. Fletcher; Michael Pellauer; Joel S. Emer

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

Nandeeka Nayak, Toluwanimi O. Odemuyiwa, Shubham Ugare, Christopher W. Fletcher, Michael Pellauer, Joel S. Emer

TL;DR

TeAAL introduces a declarative language and simulator generator to model sparse tensor accelerators with high fidelity. By expressing accelerators as cascades of mapped Einsums and augmenting them with content-preserving fibertree transformations, TeAAL enables precise, apples-to-apples modeling across diverse architectures such as OuterSPACE, ExTensor, Gamma, and SIGMA. The framework generates an imperative IR, produces real-tensor traces, and uses Accelergy-derived energy and analytical bottleneck analysis to deliver accurate performance and energy estimates, achieving close alignment with published results and enabling rapid exploration of new designs (e.g., graph-analytic accelerators like GraphDynS). The demonstrated improvements on vertex-centric graph processing and the broad expressivity of TeAAL position it as a practical tool for accelerating the design, comparison, and optimization of sparse tensor accelerators.

Abstract

Over the past few years, the explosion in sparse tensor algebra workloads has led to a corresponding rise in domain-specific accelerators to service them. Due to the irregularity present in sparse tensors, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on design-flexible sparse accelerator modeling does not express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art. To address this, we propose TeAAL: a language and simulator generator for the concise and precise specification and evaluation of sparse tensor algebra accelerators. We use TeAAL to represent and evaluate four disparate state-of-the-art accelerators -- ExTensor, Gamma, OuterSPACE, and SIGMA -- and verify that it reproduces their performance with high accuracy. Finally, we demonstrate the potential of TeAAL as a tool for designing new accelerators by showing how it can be used to speed up vertex-centric programming accelerators -- achieving $1.9\times$ on BFS and $1.2\times$ on SSSP over GraphDynS.

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

TL;DR

Abstract

on BFS and

on SSSP over GraphDynS.

Paper Structure (35 sections, 6 equations, 13 figures, 6 tables)

This paper contains 35 sections, 6 equations, 13 figures, 6 tables.

Introduction
Background and Motivation
Tensors and Fibertrees
Tensor Algebra with Extended Einsums
Mapping Hardware Accelerators
Accelerating Sparse Tensor Algebra
Overview and Insights
Insight 1: Einsum cascades capture multi-phase accelerators
Insight 2: Content-preserving transformations on fibertrees capture accelerator data-orchestration strategies
Sparse Tensor Splitting and Work Scheduling
Transposition, Sorting, and Merging
Generating the Model
Lowering Mapped Einsums to Hardware
Format
Architecture
...and 20 more sections

Figures (13)

Figure 1: Sparse matrix-vector multiplication and corresponding fibertree representations.
Figure 2: Flattening then partitioning ranks $M$, $K$ of tensor $A$ (Fig. \ref{['fig:background:fibertree']}).
Figure 3: TeAAL specification for the Einsums and mappings of OuterSPACE outerspace, described in detail in Section \ref{['sec:insights']}.
Figure 4: Rank swizzling in sparse tensor algebra computations, using outer-product multiply-merge matrix-vector multiplication. Matrix $A$ and vector $B$ use values from Figure \ref{['fig:background:fibertree']} for consistency. An offline rank swap ensures that $A$ has rank order $[K, M]$ prior to the multiply phase, and an online rank swap ensures that $T$ has rank order $[M, K]$ prior to the merge phase, ensuring concordant traversal in both phases.
Figure 5: TeAAL concrete/hardware-level model of the OuterSPACE accelerator outerspace. The fibertree (a) combined with the format specification (b) describe the concrete representation, a custom array-of-linked-lists format (c). TeAAL specifies the architecture hierarchically (f), where each level has a set of local components (d) that have tensor operations bound to them (e). More details are given in Section \ref{['sec:model:outerspace']}.
...and 8 more figures

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

TL;DR

Abstract

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

Authors

TL;DR

Abstract

Table of Contents

Figures (13)