Table of Contents
Fetching ...

Increased Compute Efficiency and the Diffusion of AI Capabilities

Konstantin Pilz, Lennart Heim, Nicholas Brown

TL;DR

This paper formalizes how increasing compute efficiency—the combined effect of hardware price performance and algorithmic efficiency—drives both wider access to AI capabilities and higher performance for those who invest in training compute. By defining compute investment efficiency $f_c(i)=f_a\left(f_h(i)\right)$ and showing that $p=f_c(i)$, it demonstrates two intertwined effects: an access effect that lowers the cost to reach a given capability and a performance effect that raises achievable performance with the same spend. It argues that large compute investors are typically the first to discover novel capabilities, including dangerous ones, and that diffusion will gradually broaden access, though ceilings and diminishing returns can curb or delay this diffusion. The authors discuss governance implications, including oversight of compute infrastructure, sharing of information about risks, defense-oriented use of advanced models, and coordination on precautionary measures to mitigate harms from rapid diffusion. The work emphasizes the need for proactive policy and industry collaboration to manage proliferation, defend against misuse, and balance innovation with safety as compute efficiency continues to advance.

Abstract

Training advanced AI models requires large investments in computational resources, or compute. Yet, as hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model to a given performance falls over time - a concept we describe as increasing compute efficiency. We find that while an access effect increases the number of actors who can train models to a given performance over time, a performance effect simultaneously increases the performance available to each actor. This potentially enables large compute investors to pioneer new capabilities, maintaining a performance advantage even as capabilities diffuse. Since large compute investors tend to develop new capabilities first, it will be particularly important that they share information about their AI models, evaluate them for emerging risks, and, more generally, make responsible development and release decisions. Further, as compute efficiency increases, governments will need to prepare for a world where dangerous AI capabilities are widely available - for instance, by developing defenses against harmful AI models or by actively intervening in the diffusion of particularly dangerous capabilities.

Increased Compute Efficiency and the Diffusion of AI Capabilities

TL;DR

This paper formalizes how increasing compute efficiency—the combined effect of hardware price performance and algorithmic efficiency—drives both wider access to AI capabilities and higher performance for those who invest in training compute. By defining compute investment efficiency and showing that , it demonstrates two intertwined effects: an access effect that lowers the cost to reach a given capability and a performance effect that raises achievable performance with the same spend. It argues that large compute investors are typically the first to discover novel capabilities, including dangerous ones, and that diffusion will gradually broaden access, though ceilings and diminishing returns can curb or delay this diffusion. The authors discuss governance implications, including oversight of compute infrastructure, sharing of information about risks, defense-oriented use of advanced models, and coordination on precautionary measures to mitigate harms from rapid diffusion. The work emphasizes the need for proactive policy and industry collaboration to manage proliferation, defend against misuse, and balance innovation with safety as compute efficiency continues to advance.

Abstract

Training advanced AI models requires large investments in computational resources, or compute. Yet, as hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model to a given performance falls over time - a concept we describe as increasing compute efficiency. We find that while an access effect increases the number of actors who can train models to a given performance over time, a performance effect simultaneously increases the performance available to each actor. This potentially enables large compute investors to pioneer new capabilities, maintaining a performance advantage even as capabilities diffuse. Since large compute investors tend to develop new capabilities first, it will be particularly important that they share information about their AI models, evaluate them for emerging risks, and, more generally, make responsible development and release decisions. Further, as compute efficiency increases, governments will need to prepare for a world where dangerous AI capabilities are widely available - for instance, by developing defenses against harmful AI models or by actively intervening in the diffusion of particularly dangerous capabilities.
Paper Structure (47 sections, 12 equations, 9 figures)

This paper contains 47 sections, 12 equations, 9 figures.

Figures (9)

  • Figure 1: Compute efficiency relates the compute investment to the performance of an AI model.
  • Figure 2: Compute efficiency improves between time $t = 0$ and $t = 1$, causing an access effect (red) and a performance effect (blue). Figures are merely conceptual and do not assert specific claims regarding the slopes of the curves.
  • Figure 3: Compute investment scaling increases the performance lead of large compute investors over time. The dashed arrows represent performance attainable without investment scaling.
  • Figure 4: Hardware price performance is the conversion function between the training compute investment in dollars and the training compute budget in operations. Algorithmic efficiency is the subsequent conversion function between the training compute budget and the performance of the resulting AI model. Compute (investment) efficiency combines hardware price performance and algorithmic efficiency, relating training compute investment to the performance of the resulting model.
  • Figure 5: Compute efficiency improves between time $t = 0$ and $t = 1$, causing an access effect (red) and a performance effect (blue). Figures are conceptual and do not make empirical claims about the slope of the curve.
  • ...and 4 more figures