Table of Contents
Fetching ...

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

Nandeeka Nayak, Toluwanimi O. Odemuyiwa, Shubham Ugare, Christopher W. Fletcher, Michael Pellauer, Joel S. Emer

TL;DR

TeAAL introduces a declarative language and simulator generator to model sparse tensor accelerators with high fidelity. By expressing accelerators as cascades of mapped Einsums and augmenting them with content-preserving fibertree transformations, TeAAL enables precise, apples-to-apples modeling across diverse architectures such as OuterSPACE, ExTensor, Gamma, and SIGMA. The framework generates an imperative IR, produces real-tensor traces, and uses Accelergy-derived energy and analytical bottleneck analysis to deliver accurate performance and energy estimates, achieving close alignment with published results and enabling rapid exploration of new designs (e.g., graph-analytic accelerators like GraphDynS). The demonstrated improvements on vertex-centric graph processing and the broad expressivity of TeAAL position it as a practical tool for accelerating the design, comparison, and optimization of sparse tensor accelerators.

Abstract

Over the past few years, the explosion in sparse tensor algebra workloads has led to a corresponding rise in domain-specific accelerators to service them. Due to the irregularity present in sparse tensors, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on design-flexible sparse accelerator modeling does not express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art. To address this, we propose TeAAL: a language and simulator generator for the concise and precise specification and evaluation of sparse tensor algebra accelerators. We use TeAAL to represent and evaluate four disparate state-of-the-art accelerators -- ExTensor, Gamma, OuterSPACE, and SIGMA -- and verify that it reproduces their performance with high accuracy. Finally, we demonstrate the potential of TeAAL as a tool for designing new accelerators by showing how it can be used to speed up vertex-centric programming accelerators -- achieving $1.9\times$ on BFS and $1.2\times$ on SSSP over GraphDynS.

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

TL;DR

TeAAL introduces a declarative language and simulator generator to model sparse tensor accelerators with high fidelity. By expressing accelerators as cascades of mapped Einsums and augmenting them with content-preserving fibertree transformations, TeAAL enables precise, apples-to-apples modeling across diverse architectures such as OuterSPACE, ExTensor, Gamma, and SIGMA. The framework generates an imperative IR, produces real-tensor traces, and uses Accelergy-derived energy and analytical bottleneck analysis to deliver accurate performance and energy estimates, achieving close alignment with published results and enabling rapid exploration of new designs (e.g., graph-analytic accelerators like GraphDynS). The demonstrated improvements on vertex-centric graph processing and the broad expressivity of TeAAL position it as a practical tool for accelerating the design, comparison, and optimization of sparse tensor accelerators.

Abstract

Over the past few years, the explosion in sparse tensor algebra workloads has led to a corresponding rise in domain-specific accelerators to service them. Due to the irregularity present in sparse tensors, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on design-flexible sparse accelerator modeling does not express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art. To address this, we propose TeAAL: a language and simulator generator for the concise and precise specification and evaluation of sparse tensor algebra accelerators. We use TeAAL to represent and evaluate four disparate state-of-the-art accelerators -- ExTensor, Gamma, OuterSPACE, and SIGMA -- and verify that it reproduces their performance with high accuracy. Finally, we demonstrate the potential of TeAAL as a tool for designing new accelerators by showing how it can be used to speed up vertex-centric programming accelerators -- achieving on BFS and on SSSP over GraphDynS.
Paper Structure (35 sections, 6 equations, 13 figures, 6 tables)

This paper contains 35 sections, 6 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Sparse matrix-vector multiplication and corresponding fibertree representations.
  • Figure 2: Flattening then partitioning ranks $M$, $K$ of tensor $A$ (Fig. \ref{['fig:background:fibertree']}).
  • Figure 3: TeAAL specification for the Einsums and mappings of OuterSPACE outerspace, described in detail in Section \ref{['sec:insights']}.
  • Figure 4: Rank swizzling in sparse tensor algebra computations, using outer-product multiply-merge matrix-vector multiplication. Matrix $A$ and vector $B$ use values from Figure \ref{['fig:background:fibertree']} for consistency. An offline rank swap ensures that $A$ has rank order $[K, M]$ prior to the multiply phase, and an online rank swap ensures that $T$ has rank order $[M, K]$ prior to the merge phase, ensuring concordant traversal in both phases.
  • Figure 5: TeAAL concrete/hardware-level model of the OuterSPACE accelerator outerspace. The fibertree (a) combined with the format specification (b) describe the concrete representation, a custom array-of-linked-lists format (c). TeAAL specifies the architecture hierarchically (f), where each level has a set of local components (d) that have tensor operations bound to them (e). More details are given in Section \ref{['sec:model:outerspace']}.
  • ...and 8 more figures