Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer
Yanni Wang, Hecheng Jia, Shilei Fu, Huiping Lin, Feng Xu
TL;DR
This work tackles the electromagnetic inverse problem of reversing SAR view angles by framing it as a reinforcement learning task where an agent learns to predict angles $[\\alpha,\\beta]$ through interaction with an embedded differentiable SAR renderer (DSR). The framework constructs a rich state from sequential and semantic differences in rendered SAR images using SARNet, and employs a discrete action space with a Rainbow-based DRL agent to progressively refine angle estimates. A composite reward function incorporating memory-difference, smoothing, auxiliary conditions, and boundary penalties stabilizes learning and accelerates convergence. Extensive experiments on simulated DS-rendered data and real MSTAR data demonstrate accurate inversion and robust cross-domain generalization, with ablations confirming the importance of the state and reward design for performance and stability.
Abstract
The electromagnetic inverse problem has long been a research hotspot. This study aims to reverse radar view angles in synthetic aperture radar (SAR) images given a target model. Nonetheless, the scarcity of SAR data, combined with the intricate background interference and imaging mechanisms, limit the applications of existing learning-based approaches. To address these challenges, we propose an interactive deep reinforcement learning (DRL) framework, where an electromagnetic simulator named differentiable SAR render (DSR) is embedded to facilitate the interaction between the agent and the environment, simulating a human-like process of angle prediction. Specifically, DSR generates SAR images at arbitrary view angles in real-time. And the differences in sequential and semantic aspects between the view angle-corresponding images are leveraged to construct the state space in DRL, which effectively suppress the complex background interference, enhance the sensitivity to temporal variations, and improve the capability to capture fine-grained information. Additionally, in order to maintain the stability and convergence of our method, a series of reward mechanisms, such as memory difference, smoothing and boundary penalty, are utilized to form the final reward function. Extensive experiments performed on both simulated and real datasets demonstrate the effectiveness and robustness of our proposed method. When utilized in the cross-domain area, the proposed method greatly mitigates inconsistency between simulated and real domains, outperforming reference methods significantly.
