Table of Contents
Fetching ...

Guiding the underwater acoustic target recognition with interpretable contrastive learning

Yuan Xie, Jiawei Ren, Ji Xu

TL;DR

This work tackles underwater acoustic target recognition with a focus on interpretability. By applying class activation mapping, the authors identify reliance on low-frequency line spectrum and high-frequency modulation cues, and they address this with an interpretable contrastive learning framework that uses two encoders for Mel and CQT features. The proposed method enforces cross-view consistency via a cosine-similarity contrastive loss and a final fusion for classification, achieving superior accuracy across Shipsear, DeepShip, and DTIL, especially when data are scarce. The approach provides regularization and improved generalization while maintaining efficient training, signaling practical benefits for robust underwater sensing in diverse ocean environments.

Abstract

Recognizing underwater targets from acoustic signals is a challenging task owing to the intricate ocean environments and variable underwater channels. While deep learning-based systems have become the mainstream approach for underwater acoustic target recognition, they have faced criticism for their lack of interpretability and weak generalization performance in practical applications. In this work, we apply the class activation mapping (CAM) to generate visual explanations for the predictions of a spectrogram-based recognition system. CAM can help to understand the behavior of recognition models by highlighting the regions of the input features that contribute the most to the prediction. Our explorations reveal that recognition models tend to focus on the low-frequency line spectrum and high-frequency periodic modulation information of underwater signals. Based on the observation, we propose an interpretable contrastive learning (ICL) strategy that employs two encoders to learn from acoustic features with different emphases (line spectrum and modulation information). By imposing constraints between encoders, the proposed strategy can enhance the generalization performance of the recognition system. Our experiments demonstrate that the proposed contrastive learning approach can improve the recognition accuracy and bring significant improvements across various underwater databases.

Guiding the underwater acoustic target recognition with interpretable contrastive learning

TL;DR

This work tackles underwater acoustic target recognition with a focus on interpretability. By applying class activation mapping, the authors identify reliance on low-frequency line spectrum and high-frequency modulation cues, and they address this with an interpretable contrastive learning framework that uses two encoders for Mel and CQT features. The proposed method enforces cross-view consistency via a cosine-similarity contrastive loss and a final fusion for classification, achieving superior accuracy across Shipsear, DeepShip, and DTIL, especially when data are scarce. The approach provides regularization and improved generalization while maintaining efficient training, signaling practical benefits for robust underwater sensing in diverse ocean environments.

Abstract

Recognizing underwater targets from acoustic signals is a challenging task owing to the intricate ocean environments and variable underwater channels. While deep learning-based systems have become the mainstream approach for underwater acoustic target recognition, they have faced criticism for their lack of interpretability and weak generalization performance in practical applications. In this work, we apply the class activation mapping (CAM) to generate visual explanations for the predictions of a spectrogram-based recognition system. CAM can help to understand the behavior of recognition models by highlighting the regions of the input features that contribute the most to the prediction. Our explorations reveal that recognition models tend to focus on the low-frequency line spectrum and high-frequency periodic modulation information of underwater signals. Based on the observation, we propose an interpretable contrastive learning (ICL) strategy that employs two encoders to learn from acoustic features with different emphases (line spectrum and modulation information). By imposing constraints between encoders, the proposed strategy can enhance the generalization performance of the recognition system. Our experiments demonstrate that the proposed contrastive learning approach can improve the recognition accuracy and bring significant improvements across various underwater databases.
Paper Structure (17 sections, 4 equations, 3 figures, 5 tables)

This paper contains 17 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The class activation mapping (CAM) heat maps for two samples in Shipsear. The heat maps highlight the line spectrum (see red box) in the low-frequency component and the periodic modulation information (see yellow box) in the high-frequency component.
  • Figure 2: The preprocessing and training pipeline of our recognition system. Gray boxes represent operations and black boxes represent neural network modules. The detailed flow of feature extraction is omitted for simplicity.
  • Figure 3: Confusion matrix heat maps.