Table of Contents
Fetching ...

Supercomputers as a Continous Medium

Martin Karp, Niclas Jansson, Philipp Schlatter, Stefano Markidis

TL;DR

The paper presents the homogeneous computer model, a continuous, physically grounded abstraction that treats a large-scale computer as a single uniform medium with distributed compute, memory, and communication properties. By expressing run time as $T(v)=\frac{W}{\pi v}+\frac{Q(sv)}{\beta v}+\frac{D(L(v))}{c}$ and optimizing over the active volume $v$, it derives first-principles insights into the scaling of conventional algorithms (CG, FFT, and matrix multiplication) and shows that even ultra-dense future systems will be fundamentally limited by data movement and propagation speed. The work recovers and extends classical models such as Amdahl's and Gustafson's laws within a physical framework, analyzes strong and weak scaling, and demonstrates that real-world systems (Frontier, Fugaku, GH200) are already approaching the classical limits imposed by $c$. It highlights that to surpass these limits, algorithmic changes that reduce communication or exploit new architectures will be essential, guiding co-design and parameter optimization for future exascale platforms.

Abstract

As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a homogeneous computer model to algorithms with a given I/O complexity, we recover from first principles, other discrete computer models, such as the roofline model, parallel computing laws, such as Amdahl's and Gustafson's laws, and phenomenological observations, such as super-linear speedup. One of the homogeneous computer model's distinctive advantages is the capability of directly linking the performance limits of an application to the physical properties of a classical computer system. Applying the homogeneous computer model to supercomputers, such as Frontier, Fugaku, and the Nvidia DGX GH200, shows that applications, such as Conjugate Gradient (CG) and Fast Fourier Transforms (FFT), are rapidly approaching the fundamental classical computational limits, where the performance of even denser systems in terms of compute and memory are fundamentally limited by the speed of light.

Supercomputers as a Continous Medium

TL;DR

The paper presents the homogeneous computer model, a continuous, physically grounded abstraction that treats a large-scale computer as a single uniform medium with distributed compute, memory, and communication properties. By expressing run time as and optimizing over the active volume , it derives first-principles insights into the scaling of conventional algorithms (CG, FFT, and matrix multiplication) and shows that even ultra-dense future systems will be fundamentally limited by data movement and propagation speed. The work recovers and extends classical models such as Amdahl's and Gustafson's laws within a physical framework, analyzes strong and weak scaling, and demonstrates that real-world systems (Frontier, Fugaku, GH200) are already approaching the classical limits imposed by . It highlights that to surpass these limits, algorithmic changes that reduce communication or exploit new architectures will be essential, guiding co-design and parameter optimization for future exascale platforms.

Abstract

As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a homogeneous computer model to algorithms with a given I/O complexity, we recover from first principles, other discrete computer models, such as the roofline model, parallel computing laws, such as Amdahl's and Gustafson's laws, and phenomenological observations, such as super-linear speedup. One of the homogeneous computer model's distinctive advantages is the capability of directly linking the performance limits of an application to the physical properties of a classical computer system. Applying the homogeneous computer model to supercomputers, such as Frontier, Fugaku, and the Nvidia DGX GH200, shows that applications, such as Conjugate Gradient (CG) and Fast Fourier Transforms (FFT), are rapidly approaching the fundamental classical computational limits, where the performance of even denser systems in terms of compute and memory are fundamentally limited by the speed of light.
Paper Structure (10 sections, 11 equations, 8 figures, 3 tables)

This paper contains 10 sections, 11 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A homogeneous computer with an active volume $v$ with distance function $D(v)$ within a spherical domain $V$. The volume has a density of $\pi$, $\beta$, and $s$ for the computational power, bandwidth, and local memory size respectively.
  • Figure 2: The run time $T$ varies depending on the compute density $\pi$, bandwidth density $\beta$, memory density $s$, and the computer volume $V$. Each line corresponds with one value of $n$, the problem size.
  • Figure 3: Run time of CG, MxM, and FFT as the memory density $s$ and performance $\Pi$ increase. The performance is impacted primarily by the latency in the lowest corner and as the performance density increases the active volume $v$ decreases and so does the run time. For a smaller problem, increasing the performance is more important, as the effect of caches is much larger and we are not as limited by the bandwidth. A blue surface corresponds to $T_W$ taking the most time, yellow-brown for $T_Q$, and green for $T_L$.
  • Figure 4: Run time for CG as the volume of the computer $V$ and performance $\Pi$ increase. The performance is impacted primarily by the latency in the lower corner and as the performance increases the active volume $v$ decreases and so does the run time. For a smaller problem, increasing the performance is more important, as the effect of caches is much larger and we are not as limited by the bandwidth.
  • Figure 5: An increased volume (and different memory densities $S$) changes the parallel efficiency when $\pi \approx \beta$ for FFT.
  • ...and 3 more figures