Table of Contents
Fetching ...

Reinforcement Learning of Adaptive Acquisition Policies for Inverse Problems

Gianluigi Silvestri, Fabio Valerio Massoli, Tribhuvanesh Orekondy, Afshin Abdi, Arash Behboodi

TL;DR

The paper tackles reducing measurement costs in high-dimensional inverse problems by learning adaptive acquisition policies via reinforcement learning. It introduces an end-to-end framework that jointly trains a reconstruction network and a measurement policy, applicable to continuous action spaces, and extends it with a probabilistic belief-state formulation using variational autoencoders. Through experiments on MNIST and MAYO with Gaussian and Radon sensing, the study shows that adaptive strategies improve reconstruction under low-acquisition budgets, with AE-E2E often outperforming baselines, though random measurements can be competitive or superior in long-horizon, high-dimensional settings. The work provides design insights, analyzes theoretical bounds on adaptive sensing, and highlights conditions under which probabilistic adaptive sensing yields the most gains, offering practical guidance for deploying adaptive acquisition in real-world inverse problems.

Abstract

A promising way to mitigate the expensive process of obtaining a high-dimensional signal is to acquire a limited number of low-dimensional measurements and solve an under-determined inverse problem by utilizing the structural prior about the signal. In this paper, we focus on adaptive acquisition schemes to save further the number of measurements. To this end, we propose a reinforcement learning-based approach that sequentially collects measurements to better recover the underlying signal by acquiring fewer measurements. Our approach applies to general inverse problems with continuous action spaces and jointly learns the recovery algorithm. Using insights obtained from theoretical analysis, we also provide a probabilistic design for our methods using variational formulation. We evaluate our approach on multiple datasets and with two measurement spaces (Gaussian, Radon). Our results confirm the benefits of adaptive strategies in low-acquisition horizon settings.

Reinforcement Learning of Adaptive Acquisition Policies for Inverse Problems

TL;DR

The paper tackles reducing measurement costs in high-dimensional inverse problems by learning adaptive acquisition policies via reinforcement learning. It introduces an end-to-end framework that jointly trains a reconstruction network and a measurement policy, applicable to continuous action spaces, and extends it with a probabilistic belief-state formulation using variational autoencoders. Through experiments on MNIST and MAYO with Gaussian and Radon sensing, the study shows that adaptive strategies improve reconstruction under low-acquisition budgets, with AE-E2E often outperforming baselines, though random measurements can be competitive or superior in long-horizon, high-dimensional settings. The work provides design insights, analyzes theoretical bounds on adaptive sensing, and highlights conditions under which probabilistic adaptive sensing yields the most gains, offering practical guidance for deploying adaptive acquisition in real-world inverse problems.

Abstract

A promising way to mitigate the expensive process of obtaining a high-dimensional signal is to acquire a limited number of low-dimensional measurements and solve an under-determined inverse problem by utilizing the structural prior about the signal. In this paper, we focus on adaptive acquisition schemes to save further the number of measurements. To this end, we propose a reinforcement learning-based approach that sequentially collects measurements to better recover the underlying signal by acquiring fewer measurements. Our approach applies to general inverse problems with continuous action spaces and jointly learns the recovery algorithm. Using insights obtained from theoretical analysis, we also provide a probabilistic design for our methods using variational formulation. We evaluate our approach on multiple datasets and with two measurement spaces (Gaussian, Radon). Our results confirm the benefits of adaptive strategies in low-acquisition horizon settings.
Paper Structure (33 sections, 2 theorems, 39 equations, 10 figures, 9 tables)

This paper contains 33 sections, 2 theorems, 39 equations, 10 figures, 9 tables.

Key Result

Theorem 1.4

If $K$ is a subset of a normed space $X$, it is symmetric $K=-K$, and satisfies $K+K\subset aK$ for a positive constant $a>0$, we have:

Figures (10)

  • Figure 1: Schematic representation of our method. A reconstruction network is trained to reconstruct the signal $x$ given a sequence of actions $a_{1:t}$ and corresponding observations $y_{1:t}$. The role of the acquisition network is to select the next action $a_{t+1}$ based on the reconstruction quality of the signal $\hat{x}_t$. The improvement in reconstruction quality between consecutive steps $t$ and $t-1$ is used as reward $r_t$ to train the acquisition network with Reinforcement Learning. After a new action $a_{t+1}$ is selected, a new observation $y_{t+1}$ is collected based on a function $F(a_{t+1},x)$, specific to the inverse problem at hand. Note that in real-world scenarios, there might be no knowledge of $F$ and $x$, and the observation $y$ can be obtained only through measurements $a$ of the environment.
  • Figure 2: Network architecture used in our experiments. A recurrent encoder (orange) maps the action ${\bm{a}}_t$ and observation ${\bm{y}}_t$ at time step $t$ to a latent representation ${\bm{z}}_t$, using the hidden state ${\bm{h}}_t$ to summarize the past actions and observations ${\bm{a}}_{1:t-1}$ and ${\bm{y}}_{1:t-1}$. A convolutional decoder is then used to reconstruct the signal $\hat{{\bm{x}}}_t$ from ${\bm{z}}_t$. The acquisition network is used to select actions ${\bm{a}}_{t+1}$ from the latent representation ${\bm{z}}_t$. The acquisition network is only used for the adaptive acquisition strategies, while the random baseline samples actions at random from a predefined probability distribution. Note how the encoder only receives gradients from the decoder $f_\phi$, and gradients from the acquisition network are never backpropagated through the encoder.
  • Figure 3: Results on the MNIST test dataset with Gaussian measurements (top) and Radon measurements (bottom). We report the mean and standard error of the mean in SSIM (Left) and worst case error in SSIM (Right) for AE-R (yellow), AE-P (blue), and AE-E2E (green) for each acquisition step in the trajectory. Each model is trained on optimizing the whole trajectory length (100 for Gaussian, 20 for Radon).
  • Figure 4: Comparison of AE-E2E (yellow) and VAE-E2E for different $\beta$ ($\beta=1\rightarrow$ blue, $\beta=0.1\rightarrow$ green, $\beta=0.01\rightarrow$ red). We show the mean and standard error of the mean in SSIM for the MNIST test set, at different stages of the acquisition trajectory. We test on models trained on different acquisition horizons: 100 for Gaussian and 20 for Radon.
  • Figure 5: Results on the MAYO test dataset, for AE-R (yellow), VAE-R ($\beta=1$, green), AE-E2E (blue) and VAE-E2E ($\beta=1$, red), as the mean and standard error of the mean in SSIM. Left: Gaussian measurements with trajectory length 50. Right: Radon measurements with trajectory length 10.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Definition 1.1
  • Definition 1.2
  • Definition 1.3
  • Theorem 1.4: Theorem 10.4. foucart_mathematical_2013
  • Definition 1.5
  • Theorem 1.6
  • proof
  • Remark 1.7