Reducing the Barriers to Entry for Foundation Model Training
Paolo Faraboschi, Ellis Giles, Justin Hotard, Konstanty Owczarek, Andrew Wheeler
TL;DR
The paper addresses the unsustainable cost trajectory of training foundation models by introducing an analytical framework that links model size, data, and hardware to training expenditure, formalized as $C = C_0 P T$ with $C_0 \approx 6$ and $T \approx 20$, yielding $C \approx 120 P^2$, and noting that Mixture-of-Experts can reduce this to $C_{MoE} = 120 P^2 / K$ though with gating challenges. It outlines an evolutionary roadmap spanning supercomputing optimizations, data reduction, model partitioning, training algorithms, data formats, efficient hardware, and competitive accelerators to halve costs incrementally. It also proposes a revolutionary path combining Analog In-Memory Computing and Energy-Based Models to further slash training costs and shift away from transformer-centric paradigms. The work argues for open, co-designed, and service-oriented strategies to democratize access to large-scale AI while aligning with sustainable energy and supply-chain constraints, thereby preventing excessive market concentration and stifled innovation.
Abstract
The world has recently witnessed an unprecedented acceleration in demands for Machine Learning and Artificial Intelligence applications. This spike in demand has imposed tremendous strain on the underlying technology stack in supply chain, GPU-accelerated hardware, software, datacenter power density, and energy consumption. If left on the current technological trajectory, future demands show insurmountable spending trends, further limiting market players, stifling innovation, and widening the technology gap. To address these challenges, we propose a fundamental change in the AI training infrastructure throughout the technology ecosystem. The changes require advancements in supercomputing and novel AI training approaches, from high-end software to low-level hardware, microprocessor, and chip design, while advancing the energy efficiency required by a sustainable infrastructure. This paper presents the analytical framework that quantitatively highlights the challenges and points to the opportunities to reduce the barriers to entry for training large language models.
