Table of Contents
Fetching ...

PBVS 2024 Solution: Self-Supervised Learning and Sampling Strategies for SAR Classification in Extreme Long-Tail Distribution

Yuhyun Kim, Minwoo Kim, Hyobin Park, Jinwook Jung, Dong-Geol Choi

TL;DR

The paper tackles automatic target recognition in Synthetic Aperture Radar under an extreme long-tail class distribution and SAR-EO domain disparity. It proposes a two-stage, multimodal learning pipeline that blends a three-channel input (original SAR, denoised SAR via Lee filter, and SAR-to-EO translated EO) with self-supervised learning (DINOv2) and cluster-based balancing (Tomek Links, NearMiss-3) to produce robust features and balanced classifiers. Classification is achieved via an ensemble of $7$ $K=3$-NN classifiers trained on balanced subsets, enabling improved generalization to rare classes. The approach demonstrates competitive performance in the PBVS 2024 Multi-modal Aerial View Image Challenge (SAR Classification), achieving 21.45% accuracy, AUC 0.56, and a total score of 0.30, ranking 9th. The work highlights the potential of integrating denoising, translation, and self-supervised learning for robust multimodal SAR recognition under severe data imbalance.

Abstract

The Multimodal Learning Workshop (PBVS 2024) aims to improve the performance of automatic target recognition (ATR) systems by leveraging both Synthetic Aperture Radar (SAR) data, which is difficult to interpret but remains unaffected by weather conditions and visible light, and Electro-Optical (EO) data for simultaneous learning. The subtask, known as the Multi-modal Aerial View Imagery Challenge - Classification, focuses on predicting the class label of a low-resolution aerial image based on a set of SAR-EO image pairs and their respective class labels. The provided dataset consists of SAR-EO pairs, characterized by a severe long-tail distribution with over a 1000-fold difference between the largest and smallest classes, making typical long-tail methods difficult to apply. Additionally, the domain disparity between the SAR and EO datasets complicates the effectiveness of standard multimodal methods. To address these significant challenges, we propose a two-stage learning approach that utilizes self-supervised techniques, combined with multimodal learning and inference through SAR-to-EO translation for effective EO utilization. In the final testing phase of the PBVS 2024 Multi-modal Aerial View Image Challenge - Classification (SAR Classification) task, our model achieved an accuracy of 21.45%, an AUC of 0.56, and a total score of 0.30, placing us 9th in the competition.

PBVS 2024 Solution: Self-Supervised Learning and Sampling Strategies for SAR Classification in Extreme Long-Tail Distribution

TL;DR

The paper tackles automatic target recognition in Synthetic Aperture Radar under an extreme long-tail class distribution and SAR-EO domain disparity. It proposes a two-stage, multimodal learning pipeline that blends a three-channel input (original SAR, denoised SAR via Lee filter, and SAR-to-EO translated EO) with self-supervised learning (DINOv2) and cluster-based balancing (Tomek Links, NearMiss-3) to produce robust features and balanced classifiers. Classification is achieved via an ensemble of -NN classifiers trained on balanced subsets, enabling improved generalization to rare classes. The approach demonstrates competitive performance in the PBVS 2024 Multi-modal Aerial View Image Challenge (SAR Classification), achieving 21.45% accuracy, AUC 0.56, and a total score of 0.30, ranking 9th. The work highlights the potential of integrating denoising, translation, and self-supervised learning for robust multimodal SAR recognition under severe data imbalance.

Abstract

The Multimodal Learning Workshop (PBVS 2024) aims to improve the performance of automatic target recognition (ATR) systems by leveraging both Synthetic Aperture Radar (SAR) data, which is difficult to interpret but remains unaffected by weather conditions and visible light, and Electro-Optical (EO) data for simultaneous learning. The subtask, known as the Multi-modal Aerial View Imagery Challenge - Classification, focuses on predicting the class label of a low-resolution aerial image based on a set of SAR-EO image pairs and their respective class labels. The provided dataset consists of SAR-EO pairs, characterized by a severe long-tail distribution with over a 1000-fold difference between the largest and smallest classes, making typical long-tail methods difficult to apply. Additionally, the domain disparity between the SAR and EO datasets complicates the effectiveness of standard multimodal methods. To address these significant challenges, we propose a two-stage learning approach that utilizes self-supervised techniques, combined with multimodal learning and inference through SAR-to-EO translation for effective EO utilization. In the final testing phase of the PBVS 2024 Multi-modal Aerial View Image Challenge - Classification (SAR Classification) task, our model achieved an accuracy of 21.45%, an AUC of 0.56, and a total score of 0.30, placing us 9th in the competition.

Paper Structure

This paper contains 6 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Provided SAR-EO Pair Dataset.
  • Figure 2: Overview of Our Proposed Pipeline
  • Figure 3: Types of Data Used in Training. (a) is the original SAR image. (b) The SAR image with a Lee filter applied. (c) The result of translating the SAR image to EO using the Pix2PixHD model, and (d) The original EO image for comparison with (c).