A Probabilistic Basis for Low-Rank Matrix Learning
Simon Segert, Nathan Wycoff
TL;DR
This work provides a rigorous probabilistic foundation for low-rank matrix learning by analyzing the nuclear-norm distribution with density $f(X)\propto e^{-\lambda\|X\|_*}$. It derives the exact normalizing constant, an exact SVD-based stochastic representation, and a tractable approximate surrogate via the Normal Product Distribution, then leverages these results to build efficient proximal-Langevin MCMC and a Gibbs-type sampler for Gaussian likelihoods. The authors also develop a Bayesian scheme to infer the penalty $\lambda$ without grid searches, and demonstrate through matrix denoising and completion experiments that adaptive $\lambda$ attains performance comparable to optimal fixed values. Collectively, the work advances Bayesian low-rank inference by linking fundamental distributions to practical Monte Carlo methods and automatic hyperparameter learning, with implications for denoising, completion, and beyond.
Abstract
Low rank inference on matrices is widely conducted by optimizing a cost function augmented with a penalty proportional to the nuclear norm $\Vert \cdot \Vert_*$. However, despite the assortment of computational methods for such problems, there is a surprising lack of understanding of the underlying probability distributions being referred to. In this article, we study the distribution with density $f(X)\propto e^{-λ\Vert X\Vert_*}$, finding many of its fundamental attributes to be analytically tractable via differential geometry. We use these facts to design an improved MCMC algorithm for low rank Bayesian inference as well as to learn the penalty parameter $λ$, obviating the need for hyperparameter tuning when this is difficult or impossible. Finally, we deploy these to improve the accuracy and efficiency of low rank Bayesian matrix denoising and completion algorithms in numerical experiments.
