Table of Contents
Fetching ...

Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality

Daniele De Sensi, Saverio Pasqualoni, Lorenzo Piarulli, Tommaso Bonato, Seydou Ba, Matteo Turisini, Jens Domke, Torsten Hoefler

TL;DR

The paper tackles the bottleneck of communication locality in oversubscribed HPC networks by introducing Bine trees, a binomial-negabinary construction that halves the effective distance between communicating ranks. Building on distance-halving and distance-doubling variants, the authors derive formal definitions, rank representations in negabinary, and partner selection rules, enabling efficient gather/scatter, allreduce, alltoall, and related collectives across diverse topologies. Extensive experiments on four large-scale systems (Dragonfly, Dragonfly+, oversubscribed fat-tree, and torus) show up to 5x speedups and global-link traffic reductions up to 33%, across multiple MPI implementations and vector sizes. The results demonstrate Bine trees’ strong generality and practical impact for scalable collective communication in modern HPC and data-center networks. The work highlights Bine as a robust, topology-agnostic alternative to traditional binomial and butterfly-based collectives, with broad applicability to future heterogeneous and multi-technology clusters.

Abstract

Communication locality plays a key role in the performance of collective operations on large HPC systems, especially on oversubscribed networks where groups of nodes are fully connected internally but sparsely linked through global connections. We present Bine (binomial negabinary) trees, a family of collective algorithms that improve communication locality. Bine trees maintain the generality of binomial trees and butterflies while cutting global-link traffic by up to 33%. We implement eight Bine-based collectives and evaluate them on four large-scale supercomputers with Dragonfly, Dragonfly+, oversubscribed fat-tree, and torus topologies, achieving up to 5x speedups and consistent reductions in global-link traffic across different vector sizes and node counts.

Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality

TL;DR

The paper tackles the bottleneck of communication locality in oversubscribed HPC networks by introducing Bine trees, a binomial-negabinary construction that halves the effective distance between communicating ranks. Building on distance-halving and distance-doubling variants, the authors derive formal definitions, rank representations in negabinary, and partner selection rules, enabling efficient gather/scatter, allreduce, alltoall, and related collectives across diverse topologies. Extensive experiments on four large-scale systems (Dragonfly, Dragonfly+, oversubscribed fat-tree, and torus) show up to 5x speedups and global-link traffic reductions up to 33%, across multiple MPI implementations and vector sizes. The results demonstrate Bine trees’ strong generality and practical impact for scalable collective communication in modern HPC and data-center networks. The work highlights Bine as a robust, topology-agnostic alternative to traditional binomial and butterfly-based collectives, with broad applicability to future heterogeneous and multi-technology clusters.

Abstract

Communication locality plays a key role in the performance of collective operations on large HPC systems, especially on oversubscribed networks where groups of nodes are fully connected internally but sparsely linked through global connections. We present Bine (binomial negabinary) trees, a family of collective algorithms that improve communication locality. Bine trees maintain the generality of binomial trees and butterflies while cutting global-link traffic by up to 33%. We implement eight Bine-based collectives and evaluate them on four large-scale supercomputers with Dragonfly, Dragonfly+, oversubscribed fat-tree, and torus topologies, achieving up to 5x speedups and consistent reductions in global-link traffic across different vector sizes and node counts.

Paper Structure

This paper contains 58 sections, 16 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Traffic on global links for a broadcast collective using distance-doubling and distance-halving binomial trees.
  • Figure 2: Distance-halving binomial tree construction.
  • Figure 3: Distance-halving Bine trees construction.
  • Figure 4: A 16-node (order 4) distance-halving Bine tree.
  • Figure 5: Reduction in global traffic on Leonardo and LUMI.
  • ...and 13 more figures