Table of Contents
Fetching ...

Subspace Langevin Monte Carlo

Tyler Maunu, Jiayi Yao

TL;DR

Subspace Langevin Monte Carlo (SLMC) advances high-dimensional sampling by projecting Langevin updates onto low-rank eigenblocks of a time-varying preconditioner, generalizing Random Coordinate LMC and block-coordinate approaches while reducing memory usage. The authors formulate subspace gradient flows in Wasserstein space, derive a discrete SLMC algorithm, and establish coupling-based convergence guarantees that can outperform traditional LMC and PLMC in ill-conditioned settings. Theoretical results hinge on relative strong convexity/smoothness with respect to evolving preconditioners and are complemented by experiments on ill-conditioned Gaussians, Bayesian logistic regression, and adaptive funnel distributions to illustrate practical gains. This work opens avenues for memory-efficient adaptive Langevin methods and potential extensions to latent-space diffusion and subspace-aware score-based modeling. Overall, SLMC combines principled geometric framing with scalable, subspace-driven updates to enable faster, more robust sampling in challenging high-dimensional problems.

Abstract

Sampling from high-dimensional distributions has wide applications in data science and machine learning but poses significant computational challenges. We introduce Subspace Langevin Monte Carlo (SLMC), a novel and efficient sampling method that generalizes random-coordinate Langevin Monte Carlo and preconditioned Langevin Monte Carlo by projecting the Langevin update onto subsampled eigenblocks of a time-varying preconditioner at each iteration. The advantage of SLMC is its superior adaptability and computational efficiency compared to traditional Langevin Monte Carlo and preconditioned Langevin Monte Carlo. Using coupling arguments, we establish error guarantees for SLMC and demonstrate its practical effectiveness through a few experiments on sampling from ill-conditioned distributions.

Subspace Langevin Monte Carlo

TL;DR

Subspace Langevin Monte Carlo (SLMC) advances high-dimensional sampling by projecting Langevin updates onto low-rank eigenblocks of a time-varying preconditioner, generalizing Random Coordinate LMC and block-coordinate approaches while reducing memory usage. The authors formulate subspace gradient flows in Wasserstein space, derive a discrete SLMC algorithm, and establish coupling-based convergence guarantees that can outperform traditional LMC and PLMC in ill-conditioned settings. Theoretical results hinge on relative strong convexity/smoothness with respect to evolving preconditioners and are complemented by experiments on ill-conditioned Gaussians, Bayesian logistic regression, and adaptive funnel distributions to illustrate practical gains. This work opens avenues for memory-efficient adaptive Langevin methods and potential extensions to latent-space diffusion and subspace-aware score-based modeling. Overall, SLMC combines principled geometric framing with scalable, subspace-driven updates to enable faster, more robust sampling in challenging high-dimensional problems.

Abstract

Sampling from high-dimensional distributions has wide applications in data science and machine learning but poses significant computational challenges. We introduce Subspace Langevin Monte Carlo (SLMC), a novel and efficient sampling method that generalizes random-coordinate Langevin Monte Carlo and preconditioned Langevin Monte Carlo by projecting the Langevin update onto subsampled eigenblocks of a time-varying preconditioner at each iteration. The advantage of SLMC is its superior adaptability and computational efficiency compared to traditional Langevin Monte Carlo and preconditioned Langevin Monte Carlo. Using coupling arguments, we establish error guarantees for SLMC and demonstrate its practical effectiveness through a few experiments on sampling from ill-conditioned distributions.

Paper Structure

This paper contains 28 sections, 8 theorems, 88 equations, 5 figures, 1 table.

Key Result

Theorem 1

If $\boldsymbol{A}_t = \boldsymbol{A}(X_t) \succeq \boldsymbol{0}$ is a function of the spatial coordinate alone, then eq:pld has $\pi \propto \exp(-V)$ as a stationary distribution.

Figures (5)

  • Figure 1: Experiments demonstrating the convergence of SLMC and PLMC compared to LMC and RCLMC for a diagonal preconditioner. Top Left: We set $h=0.01$ and use $\boldsymbol{A}_k = \boldsymbol{I}$. As we can see, setting the dimension equal to the upper left block allows SLMC to converge as fast as LMC. Top Right: We again set $h=0.01$ and let SLMC and PLMC have a diagonal preconditioner that is $\boldsymbol{I}$ on the upper left $10 \times 10$ block and $10 \boldsymbol{I}$ on the lower $10 \times 10$ block. As we can see, using a nonuniform step size allows for faster initial convergence since it is adapted to the covariance structure. This step size incurs a larger final bias. Bottom Left: SLMC blocks are taken from the eigenvalue decomposition of the covariance. The step size for SLMC is considered to be larger at $h=0.5$, while for LMC and RCLMC, it is $0.01$. The adaptation allows SLMC to converge rapidly at the onset while having a larger bias due to a larger effective step size. Bottom Right: SLMC now uses blocks from the rotation such that the top left block is $5 \times 5$. As we can see, the SLMC method with $r=5$ can now adapt to the blocks and converge faster than $r=10$.
  • Figure 2: Samples and contours for Bayesian logistic regression experiment. Left: samples and contours when $\boldsymbol{A}_k = \boldsymbol{I}$ and $h=0.01$. Middle: samples and contours when $\boldsymbol{A}_k = \boldsymbol{I}$ and $h=0.1$Right: samples and contours when $\boldsymbol{A}_k = [\frac{1}{\ell} \sum_{j=1}^\ell \nabla^2 V(\theta_j^k)]^{-1}$ and $h=0.5$.
  • Figure 3: Kernelized Stein Discrepancy versus iterations for samples from Bayesian logistic regression experiment. The discrepancy values are averaged over 20 random generated examples.
  • Figure 4: Experiment from yu2024scalable demonstrating SLMC on the funnel distribution. Left: SLMC with identity preconditioner Middle: SLMC with adaptive preconditioner based on RMSProp. Right: SLMC with preconditioner based on Adagrad. As we can see, the use of an adaptive preconditioner allows the algorithm to better explore the funnel. We see also that by forcing the preconditioner to be diagonal in RMSProp and align with the axes, the method is able to better explore the funnel.
  • Figure 5: Experiment from yu2024scalable demonstrating SLMC on a rotated funnel distribution. Left: SLMC with identity preconditioner Middle: SLMC with adaptive preconditioner based on RMSProp. Right: SLMC with preconditioner based on Adagrad. Here, we see that allowing the adaptation to be non-diagonal allows the method to adapt to the rotated funnel. This allows the SLMC method based on Adagrad to better explore both parts of the funnel.

Theorems & Definitions (19)

  • Theorem 1: Theorem 1 of ma2015complete
  • Remark 1
  • Proposition 1
  • Lemma 1
  • Definition 1
  • Definition 2
  • Remark 2
  • Lemma 2
  • Remark 3
  • Theorem 2
  • ...and 9 more