Table of Contents
Fetching ...

Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators

Emmanuel Irabor, Mariam Musavi, Abhijit Das, Sergi Abadal

TL;DR

The paper examines inter-chiplet data movement bottlenecks (NoP) in scalable multi-chip AI accelerators and proposes wireless interconnects as a flexible complement to wired NoP. It extends the GEMINI workload-mapping framework with a wireless channel to evaluate how wireless links can alleviate bottlenecks for optimally mapped workloads. Through a wireless-enabled extension, decision criteria, and simulator modifications, the study demonstrates that wireless interconnects can deliver meaningful speedups, with performance gains sensitive to load-balancing between wired and wireless planes. The work highlights a viable path to increase throughput and versatility in chiplet-based AI accelerators and informs future design choices for wireless versus wired interconnect trade-offs.

Abstract

The insatiable appetite of Artificial Intelligence (AI) workloads for computing power is pushing the industry to develop faster and more efficient accelerators. The rigidity of custom hardware, however, conflicts with the need for scalable and versatile architectures capable of catering to the needs of the evolving and heterogeneous pool of Machine Learning (ML) models in the literature. In this context, multi-chiplet architectures assembling multiple (perhaps heterogeneous) accelerators are an appealing option that is unfortunately hindered by the still rigid and inefficient chip-to-chip interconnects. In this paper, we explore the potential of wireless technology as a complement to existing wired interconnects in this multi-chiplet approach. Using an evaluation framework from the state-of-the-art, we show that wireless interconnects can lead to speedups of 10% on average and 20% maximum. We also highlight the importance of load balancing between the wired and wireless interconnects, which will be further explored in future work.

Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators

TL;DR

The paper examines inter-chiplet data movement bottlenecks (NoP) in scalable multi-chip AI accelerators and proposes wireless interconnects as a flexible complement to wired NoP. It extends the GEMINI workload-mapping framework with a wireless channel to evaluate how wireless links can alleviate bottlenecks for optimally mapped workloads. Through a wireless-enabled extension, decision criteria, and simulator modifications, the study demonstrates that wireless interconnects can deliver meaningful speedups, with performance gains sensitive to load-balancing between wired and wireless planes. The work highlights a viable path to increase throughput and versatility in chiplet-based AI accelerators and informs future design choices for wireless versus wired interconnect trade-offs.

Abstract

The insatiable appetite of Artificial Intelligence (AI) workloads for computing power is pushing the industry to develop faster and more efficient accelerators. The rigidity of custom hardware, however, conflicts with the need for scalable and versatile architectures capable of catering to the needs of the evolving and heterogeneous pool of Machine Learning (ML) models in the literature. In this context, multi-chiplet architectures assembling multiple (perhaps heterogeneous) accelerators are an appealing option that is unfortunately hindered by the still rigid and inefficient chip-to-chip interconnects. In this paper, we explore the potential of wireless technology as a complement to existing wired interconnects in this multi-chiplet approach. Using an evaluation framework from the state-of-the-art, we show that wireless interconnects can lead to speedups of 10% on average and 20% maximum. We also highlight the importance of load balancing between the wired and wireless interconnects, which will be further explored in future work.

Paper Structure

This paper contains 18 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: A schematic architecture of wireless-enabled multi-chip AI accelerator with 3$\times$3 chiplets and 4 DRAMs. An antenna and transceiver are integrated at the center of each DRAM and compute chiplet.
  • Figure 2: Percentage of time where each of the elements of a 144-TOPS 3$\times$3 multi-chip AI accelerator is the performance bottleneck.
  • Figure 3: Overall methodology. GEMINI is augmented with a wireless communication model and a wireless interface model. This allows to assess the impact of wireless interconnects on optimally mapped workloads in multi-chip AI accelerators.
  • Figure 4: Speedup of the proposed approach over a wired baseline in a 3$\times$3 multi-chip accelerator across the different evaluated AI workloads and for two different wireless bandwidths.
  • Figure 5: Impact of the distance threshold and injection probability on the performance of the proposed approach for the zfnet workload. Hotter colors indicate higher speedups whereas colder colors indicate performance degradations.