Table of Contents
Fetching ...

A System Level Performance Evaluation for Superconducting Digital Systems

Joyjit Kundu, Debjyoti Bhattacharjee, Nathan Josephsen, Ankit Pokhrel, Udara De Silva, Wenzhe Guo, Steven Van Winckel, Steven Brebels, Manu Perumkunnil, Quentin Herr, Anna Herr

TL;DR

This paper is able to convincingly show that the SCD technology can address memory and interconnect limitations of present day solutions for next-generation compute systems.

Abstract

Superconducting Digital (SCD) technology offers significant potential for enhancing the performance of next generation large scale compute workloads. By leveraging advanced lithography and a 300 mm platform, SCD devices can reduce energy consumption and boost computational power. This paper presents a cross-layer modeling approach to evaluate the system-level performance benefits of SCD architectures for Large Language Model (LLM) training and inference. Our findings, based on experimental data and Pulse Conserving Logic (PCL) design principles, demonstrate substantial performance gain in both training and inference. We are, thus, able to convincingly show that the SCD technology can address memory and interconnect limitations of present day solutions for next-generation compute systems.

A System Level Performance Evaluation for Superconducting Digital Systems

TL;DR

This paper is able to convincingly show that the SCD technology can address memory and interconnect limitations of present day solutions for next-generation compute systems.

Abstract

Superconducting Digital (SCD) technology offers significant potential for enhancing the performance of next generation large scale compute workloads. By leveraging advanced lithography and a 300 mm platform, SCD devices can reduce energy consumption and boost computational power. This paper presents a cross-layer modeling approach to evaluate the system-level performance benefits of SCD architectures for Large Language Model (LLM) training and inference. Our findings, based on experimental data and Pulse Conserving Logic (PCL) design principles, demonstrate substantial performance gain in both training and inference. We are, thus, able to convincingly show that the SCD technology can address memory and interconnect limitations of present day solutions for next-generation compute systems.

Paper Structure

This paper contains 12 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: (a)Schematic of target 16ML SCD stack. (b)Scheme of 2ML BEOL interconnects and its TEM image. (c)Scheme of JJs with $\alpha$Si barriers and its TEM image. (d)Scheme of HZO MIM capacitor its TEM image. (e)HD JSRAM 1R/1W unit cell with 8 JJs (f)Building blocks of the PCL logic family. (g)Dual rail logic gates in the PCL cell library. (h)Outline of RTL-GDS automated flow.
  • Figure 2: (\ref{['fig:datalink']}) Diagrammatic representation of the datalink interface (Cu over Glass bridge) connecting the 4K (Compute) and 77K (Main Memory) domains. (\ref{['table:datalink_spec']}) Baseline specifications for the main memory datalink.
  • Figure 3: (a) Physical representation of the SPU and SNU stacks (b) Network topology and switch design (c) Baseline SCD Blade specifications for explorations (d) Physical representation of the full SCD blade.
  • Figure 4: End-to-end performance analysis from logic design to system architecture, including mapping workload using task graphs for performance prediction for SCD system.
  • Figure 5: Impact of DRAM bandwidth (BW) per SPU in SCD system on the achieved throughput (PFLOPs/SPU) per batch for GPT3-76B model training. Inset shows the time breakdown of the memory versus compute-bound GEMMs per layer in the forward pass.
  • ...and 3 more figures