Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer

Yanni Wang; Hecheng Jia; Shilei Fu; Huiping Lin; Feng Xu

Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer

Yanni Wang, Hecheng Jia, Shilei Fu, Huiping Lin, Feng Xu

TL;DR

This work tackles the electromagnetic inverse problem of reversing SAR view angles by framing it as a reinforcement learning task where an agent learns to predict angles $[\\alpha,\\beta]$ through interaction with an embedded differentiable SAR renderer (DSR). The framework constructs a rich state from sequential and semantic differences in rendered SAR images using SARNet, and employs a discrete action space with a Rainbow-based DRL agent to progressively refine angle estimates. A composite reward function incorporating memory-difference, smoothing, auxiliary conditions, and boundary penalties stabilizes learning and accelerates convergence. Extensive experiments on simulated DS-rendered data and real MSTAR data demonstrate accurate inversion and robust cross-domain generalization, with ablations confirming the importance of the state and reward design for performance and stability.

Abstract

The electromagnetic inverse problem has long been a research hotspot. This study aims to reverse radar view angles in synthetic aperture radar (SAR) images given a target model. Nonetheless, the scarcity of SAR data, combined with the intricate background interference and imaging mechanisms, limit the applications of existing learning-based approaches. To address these challenges, we propose an interactive deep reinforcement learning (DRL) framework, where an electromagnetic simulator named differentiable SAR render (DSR) is embedded to facilitate the interaction between the agent and the environment, simulating a human-like process of angle prediction. Specifically, DSR generates SAR images at arbitrary view angles in real-time. And the differences in sequential and semantic aspects between the view angle-corresponding images are leveraged to construct the state space in DRL, which effectively suppress the complex background interference, enhance the sensitivity to temporal variations, and improve the capability to capture fine-grained information. Additionally, in order to maintain the stability and convergence of our method, a series of reward mechanisms, such as memory difference, smoothing and boundary penalty, are utilized to form the final reward function. Extensive experiments performed on both simulated and real datasets demonstrate the effectiveness and robustness of our proposed method. When utilized in the cross-domain area, the proposed method greatly mitigates inconsistency between simulated and real domains, outperforming reference methods significantly.

Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer

TL;DR

This work tackles the electromagnetic inverse problem of reversing SAR view angles by framing it as a reinforcement learning task where an agent learns to predict angles

through interaction with an embedded differentiable SAR renderer (DSR). The framework constructs a rich state from sequential and semantic differences in rendered SAR images using SARNet, and employs a discrete action space with a Rainbow-based DRL agent to progressively refine angle estimates. A composite reward function incorporating memory-difference, smoothing, auxiliary conditions, and boundary penalties stabilizes learning and accelerates convergence. Extensive experiments on simulated DS-rendered data and real MSTAR data demonstrate accurate inversion and robust cross-domain generalization, with ablations confirming the importance of the state and reward design for performance and stability.

Abstract

Paper Structure (34 sections, 18 equations, 10 figures, 6 tables)

This paper contains 34 sections, 18 equations, 10 figures, 6 tables.

Introduction
DSR Imaging
Radar Coordinate System Definition
Imaging Mechanism
Methodology
Overview
SAR Environment and Agent
DSR Embedded SAR Environment
SAR Agent
DRL Interactive Components
Action Space
State Space
Reward Function
Training Scheme of the DRL Algorithm
Experiments
...and 19 more sections

Figures (10)

Figure 1: The SAR agent learns inverse policies and takes actions to reverse view angles of SAR image from any arbitrary, mirroring the decision-making process similar to human prediction. Due to the diverse range of view angles, the SAR agents are required to employ actions with varying adjustments based on the disparity between the reference and current inversion SAR image.
Figure 2: SAR imaging geometric model. Related word coordinate system O-XYZ and the radar coordinate system $\textit{O}^{\prime}-X^{\prime}Y^{\prime}Z^{\prime}$ are defined. Our objective is to reverse the incidence angle $\alpha$ and azimuth angle $\beta$ of radar in $\textit{O}^{\prime}$-$X^{\prime}Y^{\prime}Z^{\prime}$.
Figure 3: The mapping and projection algorithm.
Figure 4: The workflow of our proposed view angle inversion algorithm. The DSR embedded SAR environment generates a simulated image using a given view angle $(\alpha_t, \beta_t)$ at time $t$. A state representation $s_t$ is constructed by the state construction module based on input SAR image and the simulated image. Then, upon receiving the state $s_t$, the SAR agent outputs two incremental angles $(\Delta\alpha_t, \Delta\beta_t)$ as an action, which are used to to update the currently predicted view angles to $(\alpha_t+\Delta\alpha_t, \beta_t+\Delta\beta_t)$ to form $(\alpha_{t+1}, \beta_{t+1})$. Meanwhile, the reward construction module returns $R_t$ to the agent in proportion to the accuracy of the inversion after the action is executed. The newly generated angles $(\alpha_{t+1}, \beta_{t+1})$ are again entered into the SAR environment for the next round of cycles.
Figure 5: The overall architecture of our proposed framework and the details of the different DRL modules including the SAR environment and agent, action space, state space and reward function. The reward $R_t$ is obtained to evaluate the inversion effect at time $t$. The option chosen by the agent depends on the currently learned policy.
...and 5 more figures

Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer

TL;DR

Abstract

Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer

Authors

TL;DR

Abstract

Table of Contents

Figures (10)