Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach
Guillaume Wang, Alireza Mousavi-Hosseini, Lénaïc Chizat
TL;DR
This paper extends Mean-Field Langevin Dynamics (MFLD) to convex optimization over signed measures by comparing two reductions to probability measures: lifting and bilevel. It shows that the lifting approach generally fails to satisfy joint displacement smoothness and uniform LSI, while the bilevel reduction preserves these properties under mild assumptions, enabling global convergence through annealing. The authors derive faster, bilevel-specific annealing schedules that achieve a fixed multiplicative accuracy with time complexity that scales more favorably in parameters like the regularization $\lambda$ and dimension $d$, compared to classical annealing. They also establish that for learning a single neuron, the local LSI constant along MFLD-Bilevel can be independent of $\beta$, $\lambda$, and $d$, implying strong local convergence and shedding light on the practical efficiency of the bilevel approach in high-dimensional settings.
Abstract
Mean-field Langevin dynamics (MLFD) is a class of interacting particle methods that tackle convex optimization over probability measures on a manifold, which are scalable, versatile, and enjoy computational guarantees. However, some important problems -- such as risk minimization for infinite width two-layer neural networks, or sparse deconvolution -- are originally defined over the set of signed, rather than probability, measures. In this paper, we investigate how to extend the MFLD framework to convex optimization problems over signed measures. Among two known reductions from signed to probability measures -- the lifting and the bilevel approaches -- we show that the bilevel reduction leads to stronger guarantees and faster rates (at the price of a higher per-iteration complexity). In particular, we investigate the convergence rate of MFLD applied to the bilevel reduction in the low-noise regime and obtain two results. First, this dynamics is amenable to an annealing schedule, adapted from Suzuki et al. (2023), that results in improved convergence rates to a fixed multiplicative accuracy. Second, we investigate the problem of learning a single neuron with the bilevel approach and obtain local exponential convergence rates that depend polynomially on the dimension and noise level (to compare with the exponential dependence that would result from prior analyses).
