Table of Contents
Fetching ...

Proximal Interacting Particle Langevin Algorithms

Paula Cordero Encinar, Francesca R. Crucinio, O. Deniz Akyildiz

TL;DR

This paper tackles parameter learning in latent variable models when the joint density is non-differentiable. It introduces the proximal interacting particle Langevin (PIPLA) family, comprising MYIPLA and PIPGLA (with a proximal-operator backbone via Moreau–Yosida envelopes) and a proximal gradient descent variant, to perform MMLE in non-differentiable settings. The authors establish nonasymptotic convergence bounds in strongly log-concave regimes, derive MKV limits, and provide extensive experiments on sparse Bayesian logistic regression, sparse Bayesian neural networks, image deblurring, and matrix completion, demonstrating both theoretical guarantees and practical advantages. Overall, PIPLA offers a principled, scalable approach for parameter estimation in non-differentiable latent variable models with improved sparsity handling and robust performance across tasks.

Abstract

We introduce a class of algorithms, termed proximal interacting particle Langevin algorithms (PIPLA), for inference and learning in latent variable models whose joint probability density is non-differentiable. Leveraging proximal Markov chain Monte Carlo techniques and interacting particle Langevin algorithms, we propose three algorithms tailored to the problem of estimating parameters in a non-differentiable statistical model. We prove nonasymptotic bounds for the parameter estimates produced by the different algorithms in the strongly log-concave setting and provide comprehensive numerical experiments on various models to demonstrate the effectiveness of the proposed methods. In particular, we demonstrate the utility of our family of algorithms for sparse Bayesian logistic regression, training of sparse Bayesian neural networks or neural networks with non-differentiable activation functions, image deblurring, and sparse matrix completion. Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in non-differentiable latent variable models.

Proximal Interacting Particle Langevin Algorithms

TL;DR

This paper tackles parameter learning in latent variable models when the joint density is non-differentiable. It introduces the proximal interacting particle Langevin (PIPLA) family, comprising MYIPLA and PIPGLA (with a proximal-operator backbone via Moreau–Yosida envelopes) and a proximal gradient descent variant, to perform MMLE in non-differentiable settings. The authors establish nonasymptotic convergence bounds in strongly log-concave regimes, derive MKV limits, and provide extensive experiments on sparse Bayesian logistic regression, sparse Bayesian neural networks, image deblurring, and matrix completion, demonstrating both theoretical guarantees and practical advantages. Overall, PIPLA offers a principled, scalable approach for parameter estimation in non-differentiable latent variable models with improved sparsity handling and robust performance across tasks.

Abstract

We introduce a class of algorithms, termed proximal interacting particle Langevin algorithms (PIPLA), for inference and learning in latent variable models whose joint probability density is non-differentiable. Leveraging proximal Markov chain Monte Carlo techniques and interacting particle Langevin algorithms, we propose three algorithms tailored to the problem of estimating parameters in a non-differentiable statistical model. We prove nonasymptotic bounds for the parameter estimates produced by the different algorithms in the strongly log-concave setting and provide comprehensive numerical experiments on various models to demonstrate the effectiveness of the proposed methods. In particular, we demonstrate the utility of our family of algorithms for sparse Bayesian logistic regression, training of sparse Bayesian neural networks or neural networks with non-differentiable activation functions, image deblurring, and sparse matrix completion. Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in non-differentiable latent variable models.
Paper Structure (68 sections, 21 theorems, 178 equations, 14 figures, 11 tables, 1 algorithm)

This paper contains 68 sections, 21 theorems, 178 equations, 14 figures, 11 tables, 1 algorithm.

Key Result

Theorem 4.1

Let Aassumption_1--Aassumption_4 hold. Let $\theta_n^N$ denote the iterate eq:pip-myula_theta and $\Bar{\theta}_{\star}$ be the maximiser of $p_\theta(y)$. Fix $\gamma_0\in(0, \min\{(L_{g_1}+\lambda^{-1})^{-1}, 2\mu^{-1}\})$. Then for every $\lambda > 0$ and $\gamma\in(0,\gamma_0]$, one has for all $n\in\mathbb{N}$, where $z_\star = (\theta_\star, N^{-1/2}x_\star,\dots,N^{-1/2}x_\star)$ and $(\th

Figures (14)

  • Figure 1: Laplace prior. Left: convergence rate of the variance of the parameter estimates against $N$ produced by MYIPLA and PIPGLA over 100 runs. We see that the $\mathcal{O}(1/N)$ convergence rate holds for the second moments. Right: evolution of the normalised MSE for 50 particles over 100 runs.
  • Figure 2: Histogram (blue) and density estimation (red) of the BNN weights for a randomly chosen particle. Our methods (top) produce sparser weights, which is crucial for compressibility, compared to IPLA (bottom), which ignores the non-differentiabilities.
  • Figure 3: Image deblurring experiment.
  • Figure 4: Bayesian logistic regression with isotropic Laplace priors on the regression weights $\prod_i \text{Laplace}(x_i|\theta,1)$, with true $\theta=-4$. Each plot shows the $\theta$-iterates for 7 different starting points.
  • Figure 5: Bayesian logistic regression with isotropic uniform priors on the regression weights $\prod_i \mathcal{U}(x_i|-\theta,\theta)$, with true $\theta=1.5$. The plot displays the $\theta$-iterates for 7 randomly chosen starting points.
  • ...and 9 more figures

Theorems & Definitions (46)

  • Definition 1: Proximity mappings
  • Definition 2: $\lambda$-Moreau-Yosida approximation
  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 4.1: MYIPLA
  • Theorem 4.2: PIPGLA
  • Proposition A.1: Convergence of minimisers
  • proof
  • Proposition A.2
  • ...and 36 more