Predictive first-principles simulations for co-designing next-generation energy-efficient AI systems

Denis Mamaluy; Md Rahatul Islam Udoy; Juan P. Mendez; Ben Feinberg; Wei Pan; Ahmedullah Aziz

Predictive first-principles simulations for co-designing next-generation energy-efficient AI systems

Denis Mamaluy, Md Rahatul Islam Udoy, Juan P. Mendez, Ben Feinberg, Wei Pan, Ahmedullah Aziz

TL;DR

It is argued that Predictive (first-principles, fitting-parameter-free) device and interconnect simulations can close the loop between nanoscale physics and workload-level metrics, enabling the identification of device/interconnect operating regimes that plausibly support improvements in energy efficiency of AI accelerators.

Abstract

In modern generative-AI workloads, matrix-vector/matrix-matrix multiplications (\emph{MatMul}) dominate the compute and energy cost. Achieving dramatic reductions in energy per token therefore requires a novel, specialized hardware that is co-designed across materials, devices, interconnects, circuits, and architectures rather than optimized at any single layer in isolation. In this \emph{Perspectives} article, we argue that \emph{predictive} (first-principles, fitting-parameter-free) device and interconnect simulations can close the loop between nanoscale physics and workload-level metrics, enabling the identification of device/interconnect operating regimes that plausibly support \emph{orders-of-magnitude} improvements in energy efficiency of AI accelerators.

Predictive first-principles simulations for co-designing next-generation energy-efficient AI systems

TL;DR

Abstract

Paper Structure (5 sections, 5 figures)

This paper contains 5 sections, 5 figures.

Introduction
Computational costs & energy efficiency of GPT-like architectures
Predictive first-principles simulations
Our vision/A proposed mini-road map
Conclusion

Figures (5)

Figure 1: Relation between the energy spent per operation and the number of operations per second across different computing systems (assuming each consumes 1kW of power). AI-related applications of the future will demand even higher throughput and therefore must be based on much more energy-efficient computing systems (denoted as "Beyond-Digital-CMOS accelerators").
Figure 2: Schematics of GPT architecture and assessment of highest computational costs within it, shown as the approximate number of field operations (op). The GPT (decoder-only Transformer) architectureGPTMulti-head-AttentionFNN is a stack of $n_{layers}$ identical layers, each comprising multiple components with varying computational costs. The most computationally expensive operations are color-coded by cost: red for high, yellow for medium, green for low. Related variables and their values in GPT-3 modelGPT3 are: $B=512$, batch size; $S=2048$, sequence length; $d_{model}=12288$, the size of model embeddings and hidden states; $n_{heads} =96$, number of attention heads; $d_{head} =128$, dimension per attention head; $V=50257$, vocabulary size; $d_{ff}=49152$, feed-forward dimension (the hidden layer size in the feed-forward network, typically $4\times d_{model}$), $n_{layers}=96$, the number of stacked Transformer decoder blocks.
Figure 3: Predictive first-principles simulations of Si:P $\delta$-layer interconnects: (a) interconnect schematics; (b) predicted sheet resistances for different doping densities and thicknesses from Ref. Mamaluy2021_CommPhys, and comparison with measurementsGoh:2006Goh:2009Reusch:2008McKibbin:2014; (c) predicted current (I) for different widths (W), and phosphorus sheet density of N$_{D}=10^{14}$cm$^{-2}$ from Ref. DeltaLayer2022_SciRep. The insets in blue color show the spatial distributions of current-carrying modes across a y-z plane, indicating the corresponding number of propagating modes (m). Inset in green color shows the total electron density all occupied electron states for a width W=12 nm.
Figure 4: Predictive first-principles simulations for beyond-CMOS and state-of-the-art CMOS devices: (a) schematics of a $\delta$-layer tunnel junction device; (b) predicted tunneling resistances for a $\delta$-layer tunnel junction of thickness t=1 nm, width W=7 nm, and doping density of N$_D$=10$^{14}$cm$^{-2}$ from Refs. DeltaLayer2022_SciRepMendez2023, and compared against the measurement and simulations in Ref. TJ_experiment; (c)-(f) predictive simulations for conductive properties of GAAFET deviceGAAFET_TechRxiv: (c) schematics of a three-nanosheets GAAFET; (d) schematics of a single nano-sheet channel simulated to investigate the leakage paths in GAAFETs shown in (e) and (f).
Figure 5: End-to-end multiscale co-design framework connecting first-principles physics to system-level performance. Predictive simulations provide material-specific electronic properties that feed into device- and interconnect-level modeling. These characteristics are translated into compact models using physics-based, machine-learning-powered, and lookup-table approaches, enabling accurate and SPICE-compatible representations of nanoscale behavior. The resulting device and interconnect models support circuit- and system-level simulations that generate workload-relevant metrics such as energy and throughput. These system-level outcomes supply feedback for iterative optimization of material selection, device geometries, and interconnect structures, establishing a predictive pathway for co-design across materials, devices, interconnects, circuits, and architectures.

Predictive first-principles simulations for co-designing next-generation energy-efficient AI systems

TL;DR

Abstract

Predictive first-principles simulations for co-designing next-generation energy-efficient AI systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)