Recent Advances in Optimal Transport for Machine Learning
Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac
TL;DR
Optimal transport (OT) provides a principled framework for comparing and transforming probability distributions, anchored by the Wasserstein distance $W_p$ and the Monge–Kantorovich formulations with transport plans in $\Gamma(P,Q)$. The paper surveys theory, computation, and ML applications from 2012–2023, covering entropy-regularized and unbalanced variants, GW/FGW, and neural OT solvers, as well as practical uses across supervised, unsupervised, transfer, and reinforcement learning. It highlights OT as both a loss function and a distribution-manipulation toolkit, with applications ranging from OT-based losses and fairness in supervised learning to generative modeling, dictionary learning, clustering, domain adaptation, and distributional RL. Major challenges include the curse of dimensionality and computational burden, while promising directions involve learned ground costs, sliced/generalized OT, and scalable neural OT architectures that integrate OT into end-to-end ML pipelines.
Abstract
Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 -- 2023, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport and its extensions, such as partial, unbalanced, Gromov and Neural Optimal Transport, and its interplay with Machine Learning practice.
