TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
Nandeeka Nayak, Toluwanimi O. Odemuyiwa, Shubham Ugare, Christopher W. Fletcher, Michael Pellauer, Joel S. Emer
TL;DR
TeAAL introduces a declarative language and simulator generator to model sparse tensor accelerators with high fidelity. By expressing accelerators as cascades of mapped Einsums and augmenting them with content-preserving fibertree transformations, TeAAL enables precise, apples-to-apples modeling across diverse architectures such as OuterSPACE, ExTensor, Gamma, and SIGMA. The framework generates an imperative IR, produces real-tensor traces, and uses Accelergy-derived energy and analytical bottleneck analysis to deliver accurate performance and energy estimates, achieving close alignment with published results and enabling rapid exploration of new designs (e.g., graph-analytic accelerators like GraphDynS). The demonstrated improvements on vertex-centric graph processing and the broad expressivity of TeAAL position it as a practical tool for accelerating the design, comparison, and optimization of sparse tensor accelerators.
Abstract
Over the past few years, the explosion in sparse tensor algebra workloads has led to a corresponding rise in domain-specific accelerators to service them. Due to the irregularity present in sparse tensors, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on design-flexible sparse accelerator modeling does not express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art. To address this, we propose TeAAL: a language and simulator generator for the concise and precise specification and evaluation of sparse tensor algebra accelerators. We use TeAAL to represent and evaluate four disparate state-of-the-art accelerators -- ExTensor, Gamma, OuterSPACE, and SIGMA -- and verify that it reproduces their performance with high accuracy. Finally, we demonstrate the potential of TeAAL as a tool for designing new accelerators by showing how it can be used to speed up vertex-centric programming accelerators -- achieving $1.9\times$ on BFS and $1.2\times$ on SSSP over GraphDynS.
