Chip-to-chip photonic connectivity in multi-accelerator servers for ML
Abhishek Vijaya Kumar, Arjun Devraj, Darius Bunandar, Rachee Singh
TL;DR
To address the data-movement bottlenecks in multi-accelerator ML servers, the paper introduces Lumorph, a circuit-switched, chip-to-chip photonic interconnect built on the Lightpath platform. Lumorph targets multi-tenant resource slicing and optimized AllReduce through on-demand optical circuits and adapted collective algorithms that account for reconfiguration latency. Key results from a lab prototype and simulations show $74\%$ faster rack-scale collective communication and up to $1.7\times$ end-to-end ML training throughput, with reconfiguration latency of $3.7\,\mu s$ factored into the $\alpha$-cost. This photonic interconnect approach offers significant gains in resource utilization and scalability for AI workloads.
Abstract
We present a rack-scale compute architecture for ML using multi-accelerator servers connected via chip-to-chip silicon photonic components. Our architecture achieves (1) multi-tenanted resource slicing without fragmentation, (2) 74% faster rack-scale collective communication, and (3) 1.7X speedup in end-to-end ML training throughput.
