Reducing the Barriers to Entry for Foundation Model Training

Paolo Faraboschi; Ellis Giles; Justin Hotard; Konstanty Owczarek; Andrew Wheeler

Reducing the Barriers to Entry for Foundation Model Training

Paolo Faraboschi, Ellis Giles, Justin Hotard, Konstanty Owczarek, Andrew Wheeler

TL;DR

The paper addresses the unsustainable cost trajectory of training foundation models by introducing an analytical framework that links model size, data, and hardware to training expenditure, formalized as $C = C_0 P T$ with $C_0 \approx 6$ and $T \approx 20$, yielding $C \approx 120 P^2$, and noting that Mixture-of-Experts can reduce this to $C_{MoE} = 120 P^2 / K$ though with gating challenges. It outlines an evolutionary roadmap spanning supercomputing optimizations, data reduction, model partitioning, training algorithms, data formats, efficient hardware, and competitive accelerators to halve costs incrementally. It also proposes a revolutionary path combining Analog In-Memory Computing and Energy-Based Models to further slash training costs and shift away from transformer-centric paradigms. The work argues for open, co-designed, and service-oriented strategies to democratize access to large-scale AI while aligning with sustainable energy and supply-chain constraints, thereby preventing excessive market concentration and stifled innovation.

Abstract

The world has recently witnessed an unprecedented acceleration in demands for Machine Learning and Artificial Intelligence applications. This spike in demand has imposed tremendous strain on the underlying technology stack in supply chain, GPU-accelerated hardware, software, datacenter power density, and energy consumption. If left on the current technological trajectory, future demands show insurmountable spending trends, further limiting market players, stifling innovation, and widening the technology gap. To address these challenges, we propose a fundamental change in the AI training infrastructure throughout the technology ecosystem. The changes require advancements in supercomputing and novel AI training approaches, from high-end software to low-level hardware, microprocessor, and chip design, while advancing the energy efficiency required by a sustainable infrastructure. This paper presents the analytical framework that quantitatively highlights the challenges and points to the opportunities to reduce the barriers to entry for training large language models.

Reducing the Barriers to Entry for Foundation Model Training

TL;DR

with

and

, yielding

, and noting that Mixture-of-Experts can reduce this to

though with gating challenges. It outlines an evolutionary roadmap spanning supercomputing optimizations, data reduction, model partitioning, training algorithms, data formats, efficient hardware, and competitive accelerators to halve costs incrementally. It also proposes a revolutionary path combining Analog In-Memory Computing and Energy-Based Models to further slash training costs and shift away from transformer-centric paradigms. The work argues for open, co-designed, and service-oriented strategies to democratize access to large-scale AI while aligning with sustainable energy and supply-chain constraints, thereby preventing excessive market concentration and stifled innovation.

Abstract

Paper Structure (20 sections, 9 figures, 3 tables)

This paper contains 20 sections, 9 figures, 3 tables.

Introduction
The Landscape of Foundation Models
LLM Growth Trends
Scaling LLM Training
Proposed Approach
Outline
Analytical Framework
Future Projections of LLM training cost
Evolutionary Roadmap
Supercomputing Technology
Data Reduction
Model Partitioning
Training Algorithms
Data Formats
Efficient Hardware
...and 5 more sections

Figures (9)

Figure 1: Technology trends and AI
Figure 2: Training of dense and sparse (MoE) models
Figure 3: Computational training cost trends in the deep learning and large scale era Epoch23
Figure 4: GPU performance/$ trends, last two decades Hobbhahn22
Figure 5: Future projections for the cost of the final training run of a single LLM
...and 4 more figures

Reducing the Barriers to Entry for Foundation Model Training

TL;DR

Abstract

Reducing the Barriers to Entry for Foundation Model Training

Authors

TL;DR

Abstract

Table of Contents

Figures (9)