Table of Contents
Fetching ...

Reinforcement-learning-based Algorithms for Optimization Problems and Applications to Inverse Problems

Chen Xu, Yun-Bin Zhao, Zhipeng Lu, Ye Zhang

TL;DR

This work develops REINFORCE-OPT, an RL-based framework for solving general continuous optimization problems by learning a search rule via a parameterized policy. It establishes a stochastic reformulation of the original optimization problem, proves almost-sure convergence to a local optimum in the policy space, and demonstrates practical superiority over several global methods. The paper further connects RL to inverse problems, showing how certain policy choices recover Tikhonov regularization and iterative regularization, and applies the approach to nonlinear integral equations and nonlinear PDE-based parameter identification, including uncertainty quantification and multi-solution detection. Overall, REINFORCE-OPT offers a robust, scalable, and uncertainty-aware alternative for challenging optimization and inverse problems, with potential extensions to actor-critic methods and discrete settings.

Abstract

We design a new iterative algorithm, called REINFORCE-OPT, for solving a general type of optimization problems. This algorithm parameterizes the solution search rule and iteratively updates the parameter using a reinforcement learning (RL) algorithm resembling REINFORCE. To gain a deeper understanding of the RL-based methods, we show that REINFORCE-OPT essentially solves a stochastic version of the given optimization problem, and that under standard assumptions, the searching rule parameter almost surely converges to a locally optimal value. Experiments show that REINFORCE-OPT outperforms other optimization methods such as gradient descent, the genetic algorithm, and particle swarm optimization, via its ability to escape from locally optimal solutions and its robustness to the choice of initial values. With rigorous derivations, we formally introduce the use of reinforcement learning to deal with inverse problems. By choosing specific probability models for the action-selection rule, we can also connect our approach to the conventional methods of Tikhonov regularization and iterative regularization. We take non-linear integral equations and parameter-identification problems in partial differential equations as examples to show how reinforcement learning can be applied in solving non-linear inverse problems. The numerical experiments highlight the strong performance of REINFORCE-OPT, as well as its ability to quantify uncertainty in error estimates and identify multiple solutions for ill-posed inverse problems that lack solution stability and uniqueness.

Reinforcement-learning-based Algorithms for Optimization Problems and Applications to Inverse Problems

TL;DR

This work develops REINFORCE-OPT, an RL-based framework for solving general continuous optimization problems by learning a search rule via a parameterized policy. It establishes a stochastic reformulation of the original optimization problem, proves almost-sure convergence to a local optimum in the policy space, and demonstrates practical superiority over several global methods. The paper further connects RL to inverse problems, showing how certain policy choices recover Tikhonov regularization and iterative regularization, and applies the approach to nonlinear integral equations and nonlinear PDE-based parameter identification, including uncertainty quantification and multi-solution detection. Overall, REINFORCE-OPT offers a robust, scalable, and uncertainty-aware alternative for challenging optimization and inverse problems, with potential extensions to actor-critic methods and discrete settings.

Abstract

We design a new iterative algorithm, called REINFORCE-OPT, for solving a general type of optimization problems. This algorithm parameterizes the solution search rule and iteratively updates the parameter using a reinforcement learning (RL) algorithm resembling REINFORCE. To gain a deeper understanding of the RL-based methods, we show that REINFORCE-OPT essentially solves a stochastic version of the given optimization problem, and that under standard assumptions, the searching rule parameter almost surely converges to a locally optimal value. Experiments show that REINFORCE-OPT outperforms other optimization methods such as gradient descent, the genetic algorithm, and particle swarm optimization, via its ability to escape from locally optimal solutions and its robustness to the choice of initial values. With rigorous derivations, we formally introduce the use of reinforcement learning to deal with inverse problems. By choosing specific probability models for the action-selection rule, we can also connect our approach to the conventional methods of Tikhonov regularization and iterative regularization. We take non-linear integral equations and parameter-identification problems in partial differential equations as examples to show how reinforcement learning can be applied in solving non-linear inverse problems. The numerical experiments highlight the strong performance of REINFORCE-OPT, as well as its ability to quantify uncertainty in error estimates and identify multiple solutions for ill-posed inverse problems that lack solution stability and uniqueness.
Paper Structure (23 sections, 10 theorems, 106 equations, 12 figures, 5 tables, 2 algorithms)

This paper contains 23 sections, 10 theorems, 106 equations, 12 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

For any $\bm{\theta}\in\mathbb{R}^d$, under the conditions that the Markov chain $\{\bm{x}_t\}$ generated following the policy $\pi_{\bm{\theta}}$ has a unique invariant probability measureConditions that ensure the existence of a invariant probability measure of $\{\bm{x}_t\}$ can be found in Appen

Figures (12)

  • Figure 1: Illustration: REINFORCE-OPT escapes from the local minimum. It seems that the gradient ascent agent moves fewer steps than REINFORCE-OPT. But in fact it keeps moving back and forth around the local minimum.
  • Figure 2: The fitness trajectory of the two methods in Figure \ref{['escape-local']}, where the fitness at step $t$ is $\mathcal{L}(x_t)$.
  • Figure 3: Illustration: REINFORCE-OPT escapes from the local minimum.
  • Figure 4: The graph of $\mathcal{L}(\bm{x}):=-\ln((\bm{x}-\bm{m}_1)^2+0.00001)-\ln((\bm{x}-\bm{m}_2)^2+0.01)$ for the 2D case.
  • Figure 5: Performance of the Evolving Policy for Simulation One. After every 100 updates of $\bm{\theta}$-update, we calculate the performance of $\pi_{\bm{\theta}}$. For model hyper-parameters, we set $T=10$, $\alpha=0.2$, $\beta=0.125$, and $\text{H}_0=4.0$ (the performance threshold for stopping training). The hidden layers of $\mathcal{N}_{\bm{\theta}}$ are $(64,64,64)$ with the activation function ReLU. The learning rate is set as $a_n:=\frac{0.001}{50,000+n}$.
  • ...and 7 more figures

Theorems & Definitions (17)

  • Proposition 1
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • Remark 1
  • proof : Proof of Lemma \ref{['Lemma2.1']}
  • Theorem 1
  • Theorem 2
  • proof
  • ...and 7 more