Table of Contents
Fetching ...

Data Gravity and the Energy Limits of Computation

Wonsuk Lee, Jehoshua Bruck

Abstract

Unlike the von Neumann architecture, which separates computation from memory, the brain tightly integrates them, an organization that large language models increasingly resemble. The crucial difference lies in the ratio of energy spent on computation versus data access: in the brain, most energy fuels compute, while in von Neumann architectures, data movement dominates. To capture this imbalance, we introduce the \emph{operation-operand disjunction constant} $G_d$, a dimensionless measure of the energy required for data transport relative to computation. As part of this framework, we propose the metaphor of \emph{data gravity}: just as mass exerts gravitational pull, large and frequently accessed data sets attract computation. We develop expressions for optimal computation placement and show that bringing the computation closer to the data can reduce energy consumption by a factor of $G_d^{(β- 1)/2}$, where $β\in (1, 3)$ captures the empirically observed distance-dependent energy scaling. We demonstrate that these findings are consistent with measurements across processors from 45\,nm to 7\,nm, as well as with results from processing-in-memory (PIM) architectures. High $G_d$ values are limiting; as $G_d$ increases, the energy required for data movement threatens to stall progress, slowing the scaling of large language models and pushing modern computing toward a plateau. Unless computation is realigned with data gravity, the growth of AI may be capped not by algorithms but by physics.

Data Gravity and the Energy Limits of Computation

Abstract

Unlike the von Neumann architecture, which separates computation from memory, the brain tightly integrates them, an organization that large language models increasingly resemble. The crucial difference lies in the ratio of energy spent on computation versus data access: in the brain, most energy fuels compute, while in von Neumann architectures, data movement dominates. To capture this imbalance, we introduce the \emph{operation-operand disjunction constant} , a dimensionless measure of the energy required for data transport relative to computation. As part of this framework, we propose the metaphor of \emph{data gravity}: just as mass exerts gravitational pull, large and frequently accessed data sets attract computation. We develop expressions for optimal computation placement and show that bringing the computation closer to the data can reduce energy consumption by a factor of , where captures the empirically observed distance-dependent energy scaling. We demonstrate that these findings are consistent with measurements across processors from 45\,nm to 7\,nm, as well as with results from processing-in-memory (PIM) architectures. High values are limiting; as increases, the energy required for data movement threatens to stall progress, slowing the scaling of large language models and pushing modern computing toward a plateau. Unless computation is realigned with data gravity, the growth of AI may be capped not by algorithms but by physics.

Paper Structure

This paper contains 19 sections, 1 theorem, 15 equations.

Key Result

Proposition 3.1

Let $d_{\min}$ be the minimum achievable separation between an operation and its operand, namely, with operation and operands co-located. Consider a traditional architecture where operation-operand separation is $d$, with disjunction constant $G_d \geq 1$. If holds, co-locating computation with data reduces total energy consumption by at least $G_d^{(\beta - 1)/2}$ compared to the separated confi

Theorems & Definitions (2)

  • Proposition 3.1: Quantification of Energy Improvement
  • proof