Table of Contents
Fetching ...

Reducing the Barriers to Entry for Foundation Model Training

Paolo Faraboschi, Ellis Giles, Justin Hotard, Konstanty Owczarek, Andrew Wheeler

TL;DR

The paper addresses the unsustainable cost trajectory of training foundation models by introducing an analytical framework that links model size, data, and hardware to training expenditure, formalized as $C = C_0 P T$ with $C_0 \approx 6$ and $T \approx 20$, yielding $C \approx 120 P^2$, and noting that Mixture-of-Experts can reduce this to $C_{MoE} = 120 P^2 / K$ though with gating challenges. It outlines an evolutionary roadmap spanning supercomputing optimizations, data reduction, model partitioning, training algorithms, data formats, efficient hardware, and competitive accelerators to halve costs incrementally. It also proposes a revolutionary path combining Analog In-Memory Computing and Energy-Based Models to further slash training costs and shift away from transformer-centric paradigms. The work argues for open, co-designed, and service-oriented strategies to democratize access to large-scale AI while aligning with sustainable energy and supply-chain constraints, thereby preventing excessive market concentration and stifled innovation.

Abstract

The world has recently witnessed an unprecedented acceleration in demands for Machine Learning and Artificial Intelligence applications. This spike in demand has imposed tremendous strain on the underlying technology stack in supply chain, GPU-accelerated hardware, software, datacenter power density, and energy consumption. If left on the current technological trajectory, future demands show insurmountable spending trends, further limiting market players, stifling innovation, and widening the technology gap. To address these challenges, we propose a fundamental change in the AI training infrastructure throughout the technology ecosystem. The changes require advancements in supercomputing and novel AI training approaches, from high-end software to low-level hardware, microprocessor, and chip design, while advancing the energy efficiency required by a sustainable infrastructure. This paper presents the analytical framework that quantitatively highlights the challenges and points to the opportunities to reduce the barriers to entry for training large language models.

Reducing the Barriers to Entry for Foundation Model Training

TL;DR

The paper addresses the unsustainable cost trajectory of training foundation models by introducing an analytical framework that links model size, data, and hardware to training expenditure, formalized as with and , yielding , and noting that Mixture-of-Experts can reduce this to though with gating challenges. It outlines an evolutionary roadmap spanning supercomputing optimizations, data reduction, model partitioning, training algorithms, data formats, efficient hardware, and competitive accelerators to halve costs incrementally. It also proposes a revolutionary path combining Analog In-Memory Computing and Energy-Based Models to further slash training costs and shift away from transformer-centric paradigms. The work argues for open, co-designed, and service-oriented strategies to democratize access to large-scale AI while aligning with sustainable energy and supply-chain constraints, thereby preventing excessive market concentration and stifled innovation.

Abstract

The world has recently witnessed an unprecedented acceleration in demands for Machine Learning and Artificial Intelligence applications. This spike in demand has imposed tremendous strain on the underlying technology stack in supply chain, GPU-accelerated hardware, software, datacenter power density, and energy consumption. If left on the current technological trajectory, future demands show insurmountable spending trends, further limiting market players, stifling innovation, and widening the technology gap. To address these challenges, we propose a fundamental change in the AI training infrastructure throughout the technology ecosystem. The changes require advancements in supercomputing and novel AI training approaches, from high-end software to low-level hardware, microprocessor, and chip design, while advancing the energy efficiency required by a sustainable infrastructure. This paper presents the analytical framework that quantitatively highlights the challenges and points to the opportunities to reduce the barriers to entry for training large language models.
Paper Structure (20 sections, 9 figures, 3 tables)

This paper contains 20 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Technology trends and AI
  • Figure 2: Training of dense and sparse (MoE) models
  • Figure 3: Computational training cost trends in the deep learning and large scale era Epoch23
  • Figure 4: GPU performance/$ trends, last two decades Hobbhahn22
  • Figure 5: Future projections for the cost of the final training run of a single LLM
  • ...and 4 more figures