The Mathematics of Artificial Intelligence
Gabriel Peyré
TL;DR
This overview analyzes how mathematics grounds AI, from empirical risk minimization and gradient dynamics to mean-field limits, neural ODE representations, and flow-based generative modeling, including attention-based transformers. It highlights analytical and probabilistic tools applied to model architectures (MLPs, ResNets, Transformers) and training dynamics (Wasserstein gradient flows, McKean–Vlasov PDEs), revealing both universal approximation insights and flow-based optimization strategies. Key contributions include mean-field and PDE descriptions of learning, flow-matching approaches for diffusion-like generation, and a particle-system view of attention, all illustrating how rigorous math informs architecture design and training behavior. The article also emphasizes open questions on generalization, reasoning capabilities of large models, resource efficiency, and privacy, urging mathematicians to contribute to principled AI theory and design.
Abstract
This overview article highlights the critical role of mathematics in artificial intelligence (AI), emphasizing that mathematics provides tools to better understand and enhance AI systems. Conversely, AI raises new problems and drives the development of new mathematics at the intersection of various fields. This article focuses on the application of analytical and probabilistic tools to model neural network architectures and better understand their optimization. Statistical questions (particularly the generalization capacity of these networks) are intentionally set aside, though they are of crucial importance. We also shed light on the evolution of ideas that have enabled significant advances in AI through architectures tailored to specific tasks, each echoing distinct mathematical techniques. The goal is to encourage more mathematicians to take an interest in and contribute to this exciting field.
