When Routers, Switches and Interconnects Compute: A processing-in-interconnect Paradigm for Scalable Neuromorphic AI
Madhuvanthi Srivatsav, Chiranjib Bhattacharyya, Shantanu Chakrabartty, Chetan Singh Thakur
TL;DR
The paper proposes Processing-in-Interconnect ($\pi^2$), a neuromorphic paradigm that reinterprets routing/switching interconnect primitives (delays, causality, time-outs, drops, and broadcasts) as computational operations. By mapping neural computations onto interconnect behaviors via credit-based shaping and asynchronous shaping protocols, $\pi^2$ enables in-network computation with energy-scalable benefits that improve as interconnect bandwidth grows; knowledge distillation further allows existing neural network topologies to be trained onto $\pi^2$ with minimal loss in generalization. Analytical and simulation results suggest near-unit energy utilization ($\eta$) with bandwidth advances and show brain-scale inference may be achieved with hundreds of watts, leveraging Ethernet/TSN hardware for scalable, distributed neuromorphic processing. The work also explores trade-offs between delay-based computation, spiking sparsity, and hardware constraints, and demonstrates through OMNET++ and GPU experiments that $\pi^2$ can approximate MAC-based networks and support scalable visual recognition tasks when complemented by distillation and hardware-aware training. Overall, $\pi^2$ offers a practical path to scalable neuromorphic AI by turning interconnects into active computational substrates, potentially transforming data movement energy into productive computation as interconnects evolve."
Abstract
Routing, switching, and the interconnect fabric are essential for large-scale neuromorphic computing. While this fabric only plays a supporting role in the process of computing, for large AI workloads it ultimately determines energy consumption and speed. In this paper, we address this bottleneck by asking: (a) What computing paradigms are inherent in existing routing, switching, and interconnect systems, and how can they be used to implement a processing-in-Interconnect (π^2) computing paradigm? and (b) leveraging current and future interconnect trends, how will a π^2 system's performance scale compared to other neuromorphic architectures? For (a), we show that operations required for typical AI workloads can be mapped onto delays, causality, time-outs, packet drop, and broadcast operations -- primitives already implemented in packet-switching and packet-routing hardware. We show that existing buffering and traffic-shaping embedded algorithms can be leveraged to implement neuron models and synaptic operations. Additionally, a knowledge-distillation framework can train and cross-map well-established neural network topologies onto $π^2$ without degrading generalization performance. For (b), analytical modeling shows that, unlike other neuromorphic platforms, the energy scaling of $π^2$ improves with interconnect bandwidth and energy efficiency. We predict that by leveraging trends in interconnect technology, a π^2 architecture can be more easily scaled to execute brain-scale AI inference workloads with power consumption levels in the range of hundreds of watts.
