Table of Contents
Fetching ...

Exploring Non-Convex Discrete Energy Landscapes: An Efficient Langevin-Like Sampler with Replica Exchange

Haoyang Zheng, Hengrong Du, Ruqi Zhang, Guang Lin

TL;DR

This work introduces the Discrete Replica EXchangE Langevin (DREXEL) sampler and its variant with Adjusted Metropolis (DREAM), and proves that the proposed samplers satisfy detailed balance and converge to the target distribution under mild conditions.

Abstract

Gradient-based Discrete Samplers (GDSs) are effective for sampling discrete energy landscapes. However, they often stagnate in complex, non-convex settings. To improve exploration, we introduce the Discrete Replica EXchangE Langevin (DREXEL) sampler and its variant with Adjusted Metropolis (DREAM). These samplers use two GDSs at different temperatures and step sizes: one focuses on local exploitation, while the other explores broader energy landscapes. When energy differences are significant, sample swaps occur, which are determined by a mechanism tailored for discrete sampling to ensure detailed balance. Theoretically, we prove that the proposed samplers satisfy detailed balance and converge to the target distribution under mild conditions. Experiments across 2d synthetic simulations, sampling from Ising models and restricted Boltzmann machines, and training deep energy-based models further confirm their efficiency in exploring non-convex discrete energy landscapes.

Exploring Non-Convex Discrete Energy Landscapes: An Efficient Langevin-Like Sampler with Replica Exchange

TL;DR

This work introduces the Discrete Replica EXchangE Langevin (DREXEL) sampler and its variant with Adjusted Metropolis (DREAM), and proves that the proposed samplers satisfy detailed balance and converge to the target distribution under mild conditions.

Abstract

Gradient-based Discrete Samplers (GDSs) are effective for sampling discrete energy landscapes. However, they often stagnate in complex, non-convex settings. To improve exploration, we introduce the Discrete Replica EXchangE Langevin (DREXEL) sampler and its variant with Adjusted Metropolis (DREAM). These samplers use two GDSs at different temperatures and step sizes: one focuses on local exploitation, while the other explores broader energy landscapes. When energy differences are significant, sample swaps occur, which are determined by a mechanism tailored for discrete sampling to ensure detailed balance. Theoretically, we prove that the proposed samplers satisfy detailed balance and converge to the target distribution under mild conditions. Experiments across 2d synthetic simulations, sampling from Ising models and restricted Boltzmann machines, and training deep energy-based models further confirm their efficiency in exploring non-convex discrete energy landscapes.

Paper Structure

This paper contains 28 sections, 2 theorems, 66 equations, 8 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Let $\alpha_1$ and $\alpha_2$ be the step sizes for the low- and high-temperature samplers, and let $q(\cdot|{\bm{\theta}})$ be the Markov chain transition kernel. Suppose the target $\pi({\bm{\theta}})$ is log-quadratic, then:

Figures (8)

  • Figure 1: DREXEL & DREAM sample trajectory in discrete domains. Blue denotes a low-temperature sampler, and red high-temperature sampler. They exchange samples following a swap mechanism.
  • Figure 2: Qualitative performance of discrete samplers on high-dimensional synthetic tasks.Top: Target energy landscapes (wave, 8 Gaussians, 16 Gaussians, moon, two moonx, twist, and flower) illustrate non-convex and multimodal structures with metastable regions. Middle and Bottom: Empirical samples from DMALA (middle) and DREAM (right) after 100,000 iterations. DREAM consistently captures all modes across tasks, whereas DMALA fails to escape local minima and misses significant regions of the target distribution.
  • Figure 3: Comparative analysis of DMALA and DREAM on the 16-Gaussian task.(a) Trace plots over 100,000 samples. (b)–(e) Ablation studies: (b) KL divergence across increasing energy barrier strength $C$; (c) sensitivity to low-temperature step size; (d) high-temperature step size; (e) high-temperature chain temperature. Results highlight that DREAM is robust across settings but sensitive to high-temperature parameters affecting swap rates.
  • Figure 4: Performance of discrete samplers on 2D Ising models.Left: RMSE (log scale) vs. iteration and runtime show DREXEL and DREAM significantly outperform DULA and DMALA, respectively, both with and without Metropolis–Hastings (MH) corrections. Middle: Summary table reports mean ± std log RMSE over 10 runs, showing consistent performance gains from replica exchange. Right: Ablation across connectivity $w \in [0.1, 0.8]$ shows that DREAM achieves peak gains near critical coupling, validating its improved exploration in strongly correlated, high-barrier regimes.
  • Figure 5: RBM sampling performance across six datasets, evaluated by log MMD.Left: Log MMD comparisons show DREAM achieves the lowest values across all datasets, indicating superior convergence to the RBM equilibrium distribution. Right: Summary table reports mean $\pm$ std of log MMD over 10 runs. DREAM consistently outperforms other methods, achieving relative MMD reductions of 8.2–12.3% over DMALA, with low variance confirming robust performance.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • proof
  • proof