Quantitative Convergence of Quadratically Regularized Linear Programs
Alberto González-Sanz, Marcel Nutz
TL;DR
The paper provides a quantitative analysis of quadratically regularized linear programs over polytopes, showing that the regularized solution $x^{\eta}$ converges stationarily to the minimum-norm LP solution $x^*$ and giving an explicit threshold $\eta^*$ beyond which $x^{\eta}=x^*$. It derives a precise formula for $\eta^*$ and a slope bound for the suboptimality $\mathcal{E}(\eta)$ on the approach to $\eta^*$, and also establishes a linear rate $\|x^{\eta}-x^0\|\le \tfrac{1}{2}\|c\|\eta$ as $\eta\to0$. The framework is applied to optimal transport, where the problem reduces to a quadratic-regularized LP on the Birkhoff polytope, yielding explicit $\eta^*$ expressions and slope bounds that scale with the number of data points $N$, along with corollaries for separated-cost structures and symmetric costs. The results illuminate when quadratic regularization recovers exact LP solutions and quantify suboptimality and convergence rates, with practical implications for sparse OT couplings and data-driven transport problems.
Abstract
Linear programs with quadratic regularization are attracting renewed interest due to their applications in optimal transport: unlike entropic regularization, the squared-norm penalty gives rise to sparse approximations of optimal transport couplings. It is well known that the solution of a quadratically regularized linear program over any polytope converges stationarily to the minimal-norm solution of the linear program when the regularization parameter tends to zero. However, that result is merely qualitative. Our main result quantifies the convergence by specifying the exact threshold for the regularization parameter, after which the regularized solution also solves the linear program. Moreover, we bound the suboptimality of the regularized solution before the threshold. These results are complemented by a convergence rate for the regime of large regularization. We apply our general results to the setting of optimal transport, where we shed light on how the threshold and suboptimality depend on the number of data points.
