Table of Contents
Fetching ...

Convergence Acceleration of Markov Chain Monte Carlo-based Gradient Descent by Deep Unfolding

Ryo Hagiwara, Satoshi Takabe

TL;DR

This study proposes a trainable sampling-based solver for combinatorial optimization problems (COPs) using a deep-learning technique called deep unfolding that substitutes auto-differentiation with a variance estimation, thereby circumventing the failure of back propagation due to the non-differentiability of MCMC.

Abstract

This study proposes a trainable sampling-based solver for combinatorial optimization problems (COPs) using a deep-learning technique called deep unfolding. The proposed solver is based on the Ohzeki method that combines Markov-chain Monte-Carlo (MCMC) and gradient descent, and its step sizes are trained by minimizing a loss function. In the training process, we propose a sampling-based gradient estimation that substitutes auto-differentiation with a variance estimation, thereby circumventing the failure of back propagation due to the non-differentiability of MCMC. The numerical results for a few COPs demonstrated that the proposed solver significantly accelerated the convergence speed compared with the original Ohzeki method.

Convergence Acceleration of Markov Chain Monte Carlo-based Gradient Descent by Deep Unfolding

TL;DR

This study proposes a trainable sampling-based solver for combinatorial optimization problems (COPs) using a deep-learning technique called deep unfolding that substitutes auto-differentiation with a variance estimation, thereby circumventing the failure of back propagation due to the non-differentiability of MCMC.

Abstract

This study proposes a trainable sampling-based solver for combinatorial optimization problems (COPs) using a deep-learning technique called deep unfolding. The proposed solver is based on the Ohzeki method that combines Markov-chain Monte-Carlo (MCMC) and gradient descent, and its step sizes are trained by minimizing a loss function. In the training process, we propose a sampling-based gradient estimation that substitutes auto-differentiation with a variance estimation, thereby circumventing the failure of back propagation due to the non-differentiability of MCMC. The numerical results for a few COPs demonstrated that the proposed solver significantly accelerated the convergence speed compared with the original Ohzeki method.
Paper Structure (1 section, 8 equations, 5 figures, 1 algorithm)

This paper contains 1 section, 8 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: The architecture and training process of DUOM. The upper part represents a forward pass, whereas the lower part depicts a backward pass. DUOM comprises an MCMC sampler and gradient descent to update $\bm{v}$. In the forward pass, the expected value and variance $V$ of $\{f_k(\bm{x})\}$ are estimated by an MCMC sampler. Then, as the sampling-based gradient estimation, the variance is used to estimate the gradient in the backward pass, and trainable step sizes $\{\eta_t\}_{t=0}^{T-1}$ are updated by back propagation. Note that the process is executed simultaneously for $k=1,\dots,m$.
  • Figure 2: Iteration dependency of the residual loss by the Ohzeki method with a fixed step size $\eta_t = \eta$ and DUOM for the $K$-minimum set problem.
  • Figure 3: MSE performance of the Ohzeki method with $\eta = 1.0 \times 10^{-4}$ and DUOM as a function of the number of iterations.
  • Figure 4: MSE as a function of the number of iterations for the image reconstruction problem. The Red symbols represent the results of DUOM, while the blue ones show the performance of the Ohzeki method with a fixed step size $\eta = 1.0 \times 10^{-2}$.
  • Figure 5: Examples of reconstructed images by DUOM (top) and the Ohzeki method with $\eta = 1.0 \times 10^{-2}$ (bottom). The size of the image is $15\times 15$, and the yellow and purple pixel represents a pixel of $x_i=1$ and $0$, respectively. The reconstructed image of each iteration was chosen from samples to minimize the loss function $L$. DUOM and the Ohzeki method with a fixed step size reconstructed the original image in $11$ and $32$ iterations, respectively.