Table of Contents
Fetching ...

Low-Rank Adaptation of Pre-Trained Stable Diffusion for Rigid-Body Target ISAR Imaging

Boan Zhang, Hang Dong, Jiongge Zhang, Long Tian, Rongrong Wang, Zhenhua Wu, Xiyang Liu, Hongwei Liu

TL;DR

The paper tackles the low-resolution challenge of RID-based ISAR imaging by introducing LoRA-SD, a texture-aware super-resolution approach that fine-tunes a pre-trained Stable Diffusion Turbo model with low-rank adapters and adversarial training to enhance time-frequency representations. The method maps low-resolution TFRs to high-resolution counterparts via a constrained, parameter-efficient fine-tuning scheme, enabling sharper, denoised ISAR images and improved frequency estimation, without being limited by the classical uncertainty principle. Experiments on simulated and measured radar data show that LoRA-SD outperforms STFT, SBL, and MF baselines in RMSE across a range of SNRs, while maintaining feasible runtime and memory, and demonstrating strong generalization to real-world data. The approach holds promise for improved 3D pose estimation and robust ISAR imaging of rigid-body targets under complex motions such as spin and precession.

Abstract

Traditional range-instantaneous Doppler (RID) methods for rigid-body target imaging often suffer from low resolution due to the limitations of time-frequency analysis (TFA). To address this challenge, our primary focus is on obtaining high resolution time-frequency representations (TFRs) from their low resolution counterparts. Recognizing that the curve features of TFRs are a specific type of texture feature, we argue that pre trained generative models such as Stable Diffusion (SD) are well suited for enhancing TFRs, thanks to their powerful capability in capturing texture representations. Building on this insight, we propose a novel inverse synthetic aperture radar (ISAR) imaging method for rigid-body targets, leveraging the low-rank adaptation (LoRA) of a pre-trained SD model. Our approach adopts the basic structure and pre-trained parameters of SD Turbo while incorporating additional linear operations for LoRA and adversarial training to achieve super-resolution and noise suppression. Then we integrate LoRA-SD into the RID-based ISAR imaging, enabling sharply focused and denoised imaging with super-resolution capabilities. We evaluate our method using both simulated and real radar data. The experimental results demonstrate the superiority of our approach in frequency es timation and ISAR imaging compared to traditional methods. Notably, the generalization capability is verified by training on simulated radar data and testing on measured radar data.

Low-Rank Adaptation of Pre-Trained Stable Diffusion for Rigid-Body Target ISAR Imaging

TL;DR

The paper tackles the low-resolution challenge of RID-based ISAR imaging by introducing LoRA-SD, a texture-aware super-resolution approach that fine-tunes a pre-trained Stable Diffusion Turbo model with low-rank adapters and adversarial training to enhance time-frequency representations. The method maps low-resolution TFRs to high-resolution counterparts via a constrained, parameter-efficient fine-tuning scheme, enabling sharper, denoised ISAR images and improved frequency estimation, without being limited by the classical uncertainty principle. Experiments on simulated and measured radar data show that LoRA-SD outperforms STFT, SBL, and MF baselines in RMSE across a range of SNRs, while maintaining feasible runtime and memory, and demonstrating strong generalization to real-world data. The approach holds promise for improved 3D pose estimation and robust ISAR imaging of rigid-body targets under complex motions such as spin and precession.

Abstract

Traditional range-instantaneous Doppler (RID) methods for rigid-body target imaging often suffer from low resolution due to the limitations of time-frequency analysis (TFA). To address this challenge, our primary focus is on obtaining high resolution time-frequency representations (TFRs) from their low resolution counterparts. Recognizing that the curve features of TFRs are a specific type of texture feature, we argue that pre trained generative models such as Stable Diffusion (SD) are well suited for enhancing TFRs, thanks to their powerful capability in capturing texture representations. Building on this insight, we propose a novel inverse synthetic aperture radar (ISAR) imaging method for rigid-body targets, leveraging the low-rank adaptation (LoRA) of a pre-trained SD model. Our approach adopts the basic structure and pre-trained parameters of SD Turbo while incorporating additional linear operations for LoRA and adversarial training to achieve super-resolution and noise suppression. Then we integrate LoRA-SD into the RID-based ISAR imaging, enabling sharply focused and denoised imaging with super-resolution capabilities. We evaluate our method using both simulated and real radar data. The experimental results demonstrate the superiority of our approach in frequency es timation and ISAR imaging compared to traditional methods. Notably, the generalization capability is verified by training on simulated radar data and testing on measured radar data.

Paper Structure

This paper contains 10 sections, 4 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Flowchart of our proposed TFA-based ISAR imaging. In order to obtain high-resolution TFRs, we replace the original time sampling in RID with our LoRA-SD for spectrum super-resolution.
  • Figure 2: Overview of our proposed LoRA-SD for TFR super-resolution. We firstly employ pre-trained SD-Turbo sauer2024fast as basic mapping function from input TFR with low-resolution to output TFR with high-resolution. Then we fix parameters of SD-Turbo and use LoRA hu2021lora in each module of SD-Turbo including encoder, U-Net, and decoder. Additionally, we introduce skip connections using Zero-convs Zhang_2023_ICCV between encoder and decoder. Among LoRA-SD, parameters denoted by blue boxes are trainable using adversarial optimization.
  • Figure 3: Left: The model of rigid-body target. $\{S_i\}_{i=1}^3$ are common scattering centers and their positions are fixed on the surface of the cone. $\{P_j\}_{j=1}^3$ are scattering centers related to the geometric structure. Among them, $P_1$ is a common scattering center. $P_2$ and $P_3$ are equivalent scattering centers, and their positions are jointly determined by the direction between LOS of the radar and the cone's central axis. Right: Scenario for recording the measured radar echos.
  • Figure 4: Left The frequency estimation error evaluated on the simulated radar data. Top right: The averaged inference time per data. Bottom right: Trainable parameters and memory usage of LoRA-SD.
  • Figure 5: Left TFRs under spin and precession on the measured radar data when SNR=5dB. Right TFRs of our model W.O adversarial training.
  • ...and 1 more figures