Table of Contents
Fetching ...

IncSAR: A Dual Fusion Incremental Learning Framework for SAR Target Recognition

George Karantaidis, Athanasios Pantsios, Ioannis Kompatsiaris, Symeon Papadopoulos

TL;DR

This work tackles catastrophic forgetting in SAR-ATR under class-incremental learning by introducing IncSAR, a dual-branch framework that fuses a pre-trained Vision Transformer with a custom SAR-CNN. The pipeline incorporates RPCA-based denoising, a fixed random projection layer to boost feature separability, and a decorrelated prototype classifier within a late-fusion architecture; lightweight variants (IncSAR_Lite) and an attention-enhanced fusion (IncSAR_LAtt) are also explored. Across MSTAR, SAR-AIRcraft-1.0, and OpenSARShip, IncSAR achieves state-of-the-art incremental performance with minimal forgetting, demonstrating strong cross-domain generalization and robustness under data-limited conditions. The approach offers a scalable, exemplar-free solution for real-world SAR-ATR deployment in dynamic environments, with practical implications for defense and remote sensing deployments.

Abstract

Deep learning techniques have achieved significant success in Synthetic Aperture Radar (SAR) target recognition using predefined datasets in static scenarios. However, real-world applications demand that models incrementally learn new information without forgetting previously acquired knowledge. The challenge of catastrophic forgetting, where models lose past knowledge when adapting to new tasks, remains a critical issue. In this paper, we introduce IncSAR, an incremental learning framework designed to tackle catastrophic forgetting in SAR target recognition. IncSAR combines the power of a Vision Transformer (ViT) and a custom-designed Convolutional Neural Network (CNN) in a dual-branch architecture, integrated via a late-fusion strategy. Additionally, we explore the use of TinyViT to reduce computational complexity and propose an attention mechanism to dynamically enhance feature representation. To mitigate the speckle noise inherent in SAR images, we employ a denoising module based on a neural network approximation of Robust Principal Component Analysis (RPCA), leveraging a simple neural network for efficient noise reduction in SAR imagery. Moreover, a random projection layer improves the linear separability of features, and a variant of Linear Discriminant Analysis (LDA) decorrelates extracted class prototypes for better generalization. Extensive experiments on the MSTAR, SAR-AIRcraft-1.0, and OpenSARShip benchmark datasets demonstrate that IncSAR significantly outperforms state-of-the-art approaches, achieving a 99.63\% average accuracy and a 0.33\% performance drop, representing an 89\% improvement in retention compared to existing techniques. The source code is available at https://github.com/geokarant/IncSAR.

IncSAR: A Dual Fusion Incremental Learning Framework for SAR Target Recognition

TL;DR

This work tackles catastrophic forgetting in SAR-ATR under class-incremental learning by introducing IncSAR, a dual-branch framework that fuses a pre-trained Vision Transformer with a custom SAR-CNN. The pipeline incorporates RPCA-based denoising, a fixed random projection layer to boost feature separability, and a decorrelated prototype classifier within a late-fusion architecture; lightweight variants (IncSAR_Lite) and an attention-enhanced fusion (IncSAR_LAtt) are also explored. Across MSTAR, SAR-AIRcraft-1.0, and OpenSARShip, IncSAR achieves state-of-the-art incremental performance with minimal forgetting, demonstrating strong cross-domain generalization and robustness under data-limited conditions. The approach offers a scalable, exemplar-free solution for real-world SAR-ATR deployment in dynamic environments, with practical implications for defense and remote sensing deployments.

Abstract

Deep learning techniques have achieved significant success in Synthetic Aperture Radar (SAR) target recognition using predefined datasets in static scenarios. However, real-world applications demand that models incrementally learn new information without forgetting previously acquired knowledge. The challenge of catastrophic forgetting, where models lose past knowledge when adapting to new tasks, remains a critical issue. In this paper, we introduce IncSAR, an incremental learning framework designed to tackle catastrophic forgetting in SAR target recognition. IncSAR combines the power of a Vision Transformer (ViT) and a custom-designed Convolutional Neural Network (CNN) in a dual-branch architecture, integrated via a late-fusion strategy. Additionally, we explore the use of TinyViT to reduce computational complexity and propose an attention mechanism to dynamically enhance feature representation. To mitigate the speckle noise inherent in SAR images, we employ a denoising module based on a neural network approximation of Robust Principal Component Analysis (RPCA), leveraging a simple neural network for efficient noise reduction in SAR imagery. Moreover, a random projection layer improves the linear separability of features, and a variant of Linear Discriminant Analysis (LDA) decorrelates extracted class prototypes for better generalization. Extensive experiments on the MSTAR, SAR-AIRcraft-1.0, and OpenSARShip benchmark datasets demonstrate that IncSAR significantly outperforms state-of-the-art approaches, achieving a 99.63\% average accuracy and a 0.33\% performance drop, representing an 89\% improvement in retention compared to existing techniques. The source code is available at https://github.com/geokarant/IncSAR.
Paper Structure (16 sections, 11 equations, 9 figures, 13 tables)

This paper contains 16 sections, 11 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: Illustration of IncSAR: A late-fusion approach is employed. The input image feeds a ViT network to extract features $\mathbf{F}_1$. The input image is passed through the filtering RPCA module, and the filtered output feeds the proposed CNN to extract features $\mathbf{F}_2$. The backbone networks are trained only in the base task of CIL, and then their weights are frozen. The extracted features $\mathbf{F}_1$, $\mathbf{F}_2$ are projected into a higher dimensional space using a random projection layer with frozen weights W and an activation function $\phi$, giving $\mathbf{H}_1$, $\mathbf{H}_2$ features respectively. During incremental training, the matrices of the decorrelated class prototypes $\mathbf{P}_1$, $\mathbf{P}_2$ are continually updated for each task. The logits $\mathbf{L}_1$, $\mathbf{L}_2$ are passed to a softmax layer S and an element-wise addition layer to derive the final prediction $\hat{y}$.
  • Figure 2: Architecture of the proposed SAR-CNN model.
  • Figure 3: Robust PCA procedure resulting in a low-rank and a sparse component.
  • Figure 4: Illustration of the proposed feature fusion attention module, demonstrating the integration of features from the ViT-Ti and SAR-CNN branches to produce an enhanced unified representation.
  • Figure 5: An example of RPCA filtering, employed in MSTAR dataset. Left: original SAR image; right: output of the filtering module.
  • ...and 4 more figures