Table of Contents
Fetching ...

Reinforced Inverse Scattering

Hanyang Jiang, Yuehaw Khoo, Haizhao Yang

TL;DR

Inverse scattering under limited data is ill-posed and highly dependent on sensor placement and frequency. The authors formulate an RL framework that treats sensor angle and frequency selection as sequential decisions within an MDP, using a GRU-based policy and PPO optimization to learn scatterer-dependent sensing strategies. Reconstruction at each step uses a sparsity-regularized data-fit solved by L-BFGS with warm-start, and rewards are PSNR gains to guide learning. The approach demonstrates significant improvements over fixed strategies across multiple scatterers and resolutions, highlighting practical potential for precision imaging under resource constraints. This framework lays groundwork for extending adaptive sensing to more challenging scattering regimes and modalities.

Abstract

Inverse wave scattering aims at determining the properties of an object using data on how the object scatters incoming waves. In order to collect information, sensors are put in different locations to send and receive waves from each other. The choice of sensor positions and incident wave frequencies determines the reconstruction quality of scatterer properties. This paper introduces reinforcement learning to develop precision imaging that decides sensor positions and wave frequencies adaptive to different scatterers in an intelligent way, thus obtaining a significant improvement in reconstruction quality with limited imaging resources. Extensive numerical results will be provided to demonstrate the superiority of the proposed method over existing methods.

Reinforced Inverse Scattering

TL;DR

Inverse scattering under limited data is ill-posed and highly dependent on sensor placement and frequency. The authors formulate an RL framework that treats sensor angle and frequency selection as sequential decisions within an MDP, using a GRU-based policy and PPO optimization to learn scatterer-dependent sensing strategies. Reconstruction at each step uses a sparsity-regularized data-fit solved by L-BFGS with warm-start, and rewards are PSNR gains to guide learning. The approach demonstrates significant improvements over fixed strategies across multiple scatterers and resolutions, highlighting practical potential for precision imaging under resource constraints. This framework lays groundwork for extending adaptive sensing to more challenging scattering regimes and modalities.

Abstract

Inverse wave scattering aims at determining the properties of an object using data on how the object scatters incoming waves. In order to collect information, sensors are put in different locations to send and receive waves from each other. The choice of sensor positions and incident wave frequencies determines the reconstruction quality of scatterer properties. This paper introduces reinforcement learning to develop precision imaging that decides sensor positions and wave frequencies adaptive to different scatterers in an intelligent way, thus obtaining a significant improvement in reconstruction quality with limited imaging resources. Extensive numerical results will be provided to demonstrate the superiority of the proposed method over existing methods.
Paper Structure (17 sections, 18 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 18 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Data generation process for the far-field pattern problem.
  • Figure 2: Structure of Recurrent Neural Network. Each GRU represents one layer, $I_t$ is the input of layer $t$. Each layer outputs a hidden state $h_t$ which is also the input of the next layer.
  • Figure 3: Structure of policy network $\pi_{\theta_p}$. $h_t^{p}$ represents the hidden state of the policy net, which is the output of GRU at layer $t$. $I_t=(d_t,u_t,T+1-t)$. We use $h_t^p$ as the input of another perceptron and generate a 360-dim categorical distribution of angle through the softmax function with a mask removing angles that have been chosen. This distribution is the angle policy $\pi_{\theta_p}^{\sigma}$. Then we randomly generate an angle $\sigma^{a_t}$ based on distribution and combine its one-hot concentrate with $h_t^{p}$ as the input of another MLP, which gives rise to another categorical distribution of frequency. This distribution represents the frequency policy given angle $\pi_{\theta_p}^{\omega|\sigma}$. Finally, we use this to randomly generate a frequency $\omega^{a_t}$.
  • Figure 4: Structure of value network $\gamma_{\theta_v}$. $h_t^{v}$ represents the hidden state of value net, which is the output of GRU at layer $t$. $I_t=(d_t,u_t,T+1-t)$. We use $h_t^{v}$ as the input of another perceptron and generate $\hat{v}_{\pi_{\theta_p}}(s_t)$, which is the estimate of the value of current state $s_t$ under the policy $\pi_{\theta_p}$ parameterized by policy network.
  • Figure 5: We compare the reconstruction results of the initial strategy and the trained strategy on two specific types of scatterers with sizes of $32 \times 32$. The true scatterer is shown in subplots (a) and (d). Each plot is tagged with the respective method, and the PSNR of the reconstruction indicates the difference in resolution.
  • ...and 3 more figures