Table of Contents
Fetching ...

Low-Depth Spatial Tree Algorithms

Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski

TL;DR

This work introduces spatial tree algorithms tailored to 2D spatial architectures where communication energy scales with Manhattan distance and depth captures parallelism. It combines a locality-optimized layout (light-first order combined with space-filling curves) with a suite of Las Vegas algorithms for treefix sum and LCA, achieving near-linear energy and poly-logarithmic depth for core tree operations. The approach handles unbounded-degree trees via local messaging and a contraction/undo framework, and constructs the necessary layout through Euler tours and list ranking to realize a practical pipeline. The results demonstrate that computations can attain high spatial locality and low depth, offering a pathway to efficient sparse computations on modern accelerators like wafer-scale engines and CGRAs with broad applicability to sparse graph processing and beyond.

Abstract

Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic challenges, particularly when managing sparse data, a pivotal component in progressing data science. The spatial computer model quantifies communication locality by weighting processor communication costs by distance, introducing a term named energy. Moreover, it integrates depth, a widely-utilized metric, to promote high parallelism. We propose and analyze a framework for efficient spatial tree algorithms within the spatial computer model. Our primary method constructs a spatial tree layout that optimizes the locality of the neighbors in the compute grid. This approach thereby enables locality-optimized messaging within the tree. Our layout achieves a polynomial factor improvement in energy compared to utilizing a PRAM approach. Using this layout, we develop energy-efficient treefix sum and lowest common ancestor algorithms, which are both fundamental building blocks for other graph algorithms. With high probability, our algorithms exhibit near-linear energy and poly-logarithmic depth. Our contributions augment a growing body of work demonstrating that computations can have both high spatial locality and low depth. Moreover, our work constitutes an advancement in the spatial layout of irregular and sparse computations.

Low-Depth Spatial Tree Algorithms

TL;DR

This work introduces spatial tree algorithms tailored to 2D spatial architectures where communication energy scales with Manhattan distance and depth captures parallelism. It combines a locality-optimized layout (light-first order combined with space-filling curves) with a suite of Las Vegas algorithms for treefix sum and LCA, achieving near-linear energy and poly-logarithmic depth for core tree operations. The approach handles unbounded-degree trees via local messaging and a contraction/undo framework, and constructs the necessary layout through Euler tours and list ranking to realize a practical pipeline. The results demonstrate that computations can attain high spatial locality and low depth, offering a pathway to efficient sparse computations on modern accelerators like wafer-scale engines and CGRAs with broad applicability to sparse graph processing and beyond.

Abstract

Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic challenges, particularly when managing sparse data, a pivotal component in progressing data science. The spatial computer model quantifies communication locality by weighting processor communication costs by distance, introducing a term named energy. Moreover, it integrates depth, a widely-utilized metric, to promote high parallelism. We propose and analyze a framework for efficient spatial tree algorithms within the spatial computer model. Our primary method constructs a spatial tree layout that optimizes the locality of the neighbors in the compute grid. This approach thereby enables locality-optimized messaging within the tree. Our layout achieves a polynomial factor improvement in energy compared to utilizing a PRAM approach. Using this layout, we develop energy-efficient treefix sum and lowest common ancestor algorithms, which are both fundamental building blocks for other graph algorithms. With high probability, our algorithms exhibit near-linear energy and poly-logarithmic depth. Our contributions augment a growing body of work demonstrating that computations can have both high spatial locality and low depth. Moreover, our work constitutes an advancement in the spatial layout of irregular and sparse computations.
Paper Structure (29 sections, 23 theorems, 6 equations, 8 figures, 1 table)

This paper contains 29 sections, 23 theorems, 6 equations, 8 figures, 1 table.

Key Result

Lemma 1

$E(n) \leq (\sum_{i = 1}^\Delta E(s({c_i})) + (\Delta - i) \cdot c\sqrt{s({c_i})}) + \Delta \cdot c\sqrt{2}$.

Figures (8)

  • Figure 1: Part of a tree stored in Hilbert-light-first order. The smaller subtree is stored first, then the larger subtree follows. Both subtrees are stored similarly recursively. Mapping this linear order onto the Hilbert curve yields an energy-efficient two-dimensional layout.
  • Figure 2: 16 elements stored in Z-order. Given $i = 6$ and $j = 10$ the longest diagonal would be the blue one. The x-length of the diagonal would be 3 and the y-length 1. Moreover, we have that $E_d(6, 10) = 4$.
  • Figure 3: Example of a virtual tree rooted at a vertex $v$ before and after the second step of Transform. The vertex $v$ has degree $4$ after the transform. Solid lines connect to the current children, whereas dashed lines connect to the appended children.
  • Figure 4: Example of a procedure for passing the references. The black edges represent the tree $\hat{T}$. The directed blue edges represent each vertex's references before and after the operation.
  • Figure 5: Example illustrating the compression of supervertices. Every supervertex corresponds to several vertices as indicated by the contiguous space taken up in the grid drawing. We first compress $u$ with $v$ and then $u$ with $w$.
  • ...and 3 more figures

Theorems & Definitions (23)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Theorem 2
  • Lemma 3
  • Lemma 4
  • Corollary 1
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • ...and 13 more