ACiS: Complex Processing in the Switch Fabric

Pouya Haghi; Anqi Guo; Tong Geng; Anthony Skjellum; Martin Herbordt

ACiS: Complex Processing in the Switch Fabric

Pouya Haghi, Anqi Guo, Tong Geng, Anthony Skjellum, Martin Herbordt

TL;DR

ACiS proposes a general in-switch computing framework that extends switch fabric with a CGRA-based accelerator to offload and fuse HPC computations inside the network. By defining a progression of processing types from simple collectives to user-defined operations, look-aside stateful processing, and fused map–collective patterns, the approach enables transparent MPI acceleration without compromising existing datapath performance. Hardware plugins and a modular PISA integration support scalable, programmable in-switch computation, while software tooling provides MPI transparency, a source-to-source translator for fused collectives, and a usage database to guide deployment. Experimental results across indirect/direct networks, NAS benchmarks, and graph neural networks demonstrate substantial latency reductions and scalability improvements, illustrating the practical impact of shifting computation into the switch fabric for HPC workloads.

Abstract

For the last three decades a core use of FPGAs has been for processing communication: FPGA-based SmartNICs are in widespread use from the datacenter to IoT. Augmenting switches with FPGAs, however, has been less studied, but has numerous advantages built around the processing being moved from the edge of the network to the center. Communication switches have previously been augmented to process collectives, e.g., IBM BlueGene and Mellanox SHArP, but the support has been limited to a small set of predefined scalar operations and datatypes. Here we present ACiS, a framework and taxonomy for Advanced Computing in the Switch that unifies and expands our previous work in this area. In addition to fixed scalar collectives (Type 1), we propose three more types of in-switch application processing: (Type 2) User-defined operations and types, including data structures; (Type 3) Look-aside operations that have state within the operation and can have loops; and (Type 4) Fused collectives built by fusing multiple existing collectives or collectives with map computations. ACiS is supported in hardware with modular switch extensions including a CGRA architecture. Software support for ACiS includes evaluation and translation of relevant parts of user programs, compilation of user specifications into control flow graphs, and mapping the graphs into switch hardware. The overall goal is the transparent acceleration of HPC applications encapsulated within an MPI implementation.

ACiS: Complex Processing in the Switch Fabric

TL;DR

Abstract

ACiS: Complex Processing in the Switch Fabric

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)