Nexus Machine: An Active Message Inspired Reconfigurable Architecture for Irregular Workloads
Rohan Juneja, Pranav Dangi, Thilini Kaushalya Bandara, Tulika Mitra, Li-shiuan Peh
TL;DR
Nexus Machine tackles irregular workloads on resource-constrained edge devices by introducing an Active Message–inspired reconfigurable architecture that performs data-driven execution and en-route computation. It unifies coarse-grained tensor partitioning, data-local execution, and in-network computing with a flexible AM format and dynamic routing, supported by a compiler and runtime stack. Empirical results show up to 90% higher performance and 70% higher fabric utilization than state-of-the-art baselines, with 22 nm implementation achieving 1.9x improvement over a generic CGRA and 1.7x fabric utilization gains, and an average 1.35x performance boost over prior art. The work demonstrates strong potential for energy-efficient, scalable irregular workloads on edge CGRAs, enabling robust performance across sparse, dense, and graph workloads.
Abstract
Modern reconfigurable architectures are increasingly favored for resource-constrained edge devices as they balance high performance, energy efficiency, and programmability well. However, their proficiency in handling regular compute patterns constrains their effectiveness in executing irregular workloads, such as sparse linear algebra and graph analytics with unpredictable access patterns and control flow. To address this limitation, we introduce the Nexus Machine, a novel reconfigurable architecture consisting of a PE array designed to efficiently handle irregularity by distributing sparse tensors across the fabric and employing active messages that morph instructions based on dynamic control flow. As the inherent irregularity in workloads can lead to high load imbalance among different Processing Elements (PEs), Nexus Machine deploys and executes instructions en-route on idle PEs at run-time. Thus, unlike traditional reconfigurable architectures with only static instructions within each PE, Nexus Machine brings dynamic control to the idle compute units, mitigating load imbalance and enhancing overall performance. Our experiments demonstrate that Nexus Machine achieves 90% better performance compared to state-of-the-art (SOTA) reconfigurable architectures, within the same power budget and area. Nexus Machine also achieves 70% higher fabric utilization, in contrast to SOTA architectures.
