Proximal Interacting Particle Langevin Algorithms

Paula Cordero Encinar; Francesca R. Crucinio; O. Deniz Akyildiz

Proximal Interacting Particle Langevin Algorithms

Paula Cordero Encinar, Francesca R. Crucinio, O. Deniz Akyildiz

TL;DR

This paper tackles parameter learning in latent variable models when the joint density is non-differentiable. It introduces the proximal interacting particle Langevin (PIPLA) family, comprising MYIPLA and PIPGLA (with a proximal-operator backbone via Moreau–Yosida envelopes) and a proximal gradient descent variant, to perform MMLE in non-differentiable settings. The authors establish nonasymptotic convergence bounds in strongly log-concave regimes, derive MKV limits, and provide extensive experiments on sparse Bayesian logistic regression, sparse Bayesian neural networks, image deblurring, and matrix completion, demonstrating both theoretical guarantees and practical advantages. Overall, PIPLA offers a principled, scalable approach for parameter estimation in non-differentiable latent variable models with improved sparsity handling and robust performance across tasks.

Abstract

We introduce a class of algorithms, termed proximal interacting particle Langevin algorithms (PIPLA), for inference and learning in latent variable models whose joint probability density is non-differentiable. Leveraging proximal Markov chain Monte Carlo techniques and interacting particle Langevin algorithms, we propose three algorithms tailored to the problem of estimating parameters in a non-differentiable statistical model. We prove nonasymptotic bounds for the parameter estimates produced by the different algorithms in the strongly log-concave setting and provide comprehensive numerical experiments on various models to demonstrate the effectiveness of the proposed methods. In particular, we demonstrate the utility of our family of algorithms for sparse Bayesian logistic regression, training of sparse Bayesian neural networks or neural networks with non-differentiable activation functions, image deblurring, and sparse matrix completion. Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in non-differentiable latent variable models.

Proximal Interacting Particle Langevin Algorithms

TL;DR

Abstract

Paper Structure (68 sections, 21 theorems, 178 equations, 14 figures, 11 tables, 1 algorithm)

This paper contains 68 sections, 21 theorems, 178 equations, 14 figures, 11 tables, 1 algorithm.

Introduction
Contribution.
Notation.
Background
Langevin Dynamics
MMLE with Langevin Dynamics
Proximal Methods
Proximal Langevin methods
Proximal gradient MCMC methods
Proximal Interacting Particle Methods for MMLE
Proximal Interacting Particle Algorithms
Moreau-Yosida Interacting Particle Langevin Algorithm (MYIPLA)
Proximal Interacting Particle Gradient Langevin Algorithm (PIPGLA)
Proximal Particle Gradient Descent Methods (PPGD)
Nonasymptotic analysis
...and 53 more sections

Key Result

Theorem 4.1

Let Aassumption_1--Aassumption_4 hold. Let $\theta_n^N$ denote the iterate eq:pip-myula_theta and $\Bar{\theta}_{\star}$ be the maximiser of $p_\theta(y)$. Fix $\gamma_0\in(0, \min\{(L_{g_1}+\lambda^{-1})^{-1}, 2\mu^{-1}\})$. Then for every $\lambda > 0$ and $\gamma\in(0,\gamma_0]$, one has for all $n\in\mathbb{N}$, where $z_\star = (\theta_\star, N^{-1/2}x_\star,\dots,N^{-1/2}x_\star)$ and $(\th

Figures (14)

Figure 1: Laplace prior. Left: convergence rate of the variance of the parameter estimates against $N$ produced by MYIPLA and PIPGLA over 100 runs. We see that the $\mathcal{O}(1/N)$ convergence rate holds for the second moments. Right: evolution of the normalised MSE for 50 particles over 100 runs.
Figure 2: Histogram (blue) and density estimation (red) of the BNN weights for a randomly chosen particle. Our methods (top) produce sparser weights, which is crucial for compressibility, compared to IPLA (bottom), which ignores the non-differentiabilities.
Figure 3: Image deblurring experiment.
Figure 4: Bayesian logistic regression with isotropic Laplace priors on the regression weights $\prod_i \text{Laplace}(x_i|\theta,1)$, with true $\theta=-4$. Each plot shows the $\theta$-iterates for 7 different starting points.
Figure 5: Bayesian logistic regression with isotropic uniform priors on the regression weights $\prod_i \mathcal{U}(x_i|-\theta,\theta)$, with true $\theta=1.5$. The plot displays the $\theta$-iterates for 7 randomly chosen starting points.
...and 9 more figures

Theorems & Definitions (46)

Definition 1: Proximity mappings
Definition 2: $\lambda$-Moreau-Yosida approximation
Remark 1
Remark 2
Remark 3
Theorem 4.1: MYIPLA
Theorem 4.2: PIPGLA
Proposition A.1: Convergence of minimisers
proof
Proposition A.2
...and 36 more

Proximal Interacting Particle Langevin Algorithms

TL;DR

Abstract

Proximal Interacting Particle Langevin Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (46)