Table of Contents
Fetching ...

A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition

Wei Huang, Shumeng Sun, Junpeng Lu, Zhenpeng Xu, Zhengyang Xiu, Hao Zhang

TL;DR

This work tackles few-shot underwater acoustic target recognition by introducing MT-BCA-CNN, a model that synergistically combines a channel attention mechanism with a multi-task learning objective. The architecture uses a shared, attention-guided feature extractor and two task heads—classification and spectrogram reconstruction—with a dynamic weighting scheme to balance tasks, and a Gaussian-smoothed attention to stabilize channel emphasis. On the Watkins Marine Mammal Sound Database, MT-BCA-CNN achieves 97% accuracy and 95% F1-score with around 0.11–0.13M parameters, significantly outperforming baselines while maintaining minimal model complexity. Ablation studies confirm the necessity of both channel attention and multi-task learning, and visualizations show improved intra-class compactness and inter-class separation, highlighting its practical potential for robust, data-efficient UATR in challenging marine environments.

Abstract

Underwater acoustic target recognition (UATR) is of great significance for the protection of marine diversity and national defense security. The development of deep learning provides new opportunities for UATR, but faces challenges brought by the scarcity of reference samples and complex environmental interference. To address these issues, we proposes a multi-task balanced channel attention convolutional neural network (MT-BCA-CNN). The method integrates a channel attention mechanism with a multi-task learning strategy, constructing a shared feature extractor and multi-task classifiers to jointly optimize target classification and feature reconstruction tasks. The channel attention mechanism dynamically enhances discriminative acoustic features such as harmonic structures while suppressing noise. Experiments on the Watkins Marine Life Dataset demonstrate that MT-BCA-CNN achieves 97\% classification accuracy and 95\% $F1$-score in 27-class few-shot scenarios, significantly outperforming traditional CNN and ACNN models, as well as popular state-of-the-art UATR methods. Ablation studies confirm the synergistic benefits of multi-task learning and attention mechanisms, while a dynamic weighting adjustment strategy effectively balances task contributions. This work provides an efficient solution for few-shot underwater acoustic recognition, advancing research in marine bioacoustics and sonar signal processing.

A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition

TL;DR

This work tackles few-shot underwater acoustic target recognition by introducing MT-BCA-CNN, a model that synergistically combines a channel attention mechanism with a multi-task learning objective. The architecture uses a shared, attention-guided feature extractor and two task heads—classification and spectrogram reconstruction—with a dynamic weighting scheme to balance tasks, and a Gaussian-smoothed attention to stabilize channel emphasis. On the Watkins Marine Mammal Sound Database, MT-BCA-CNN achieves 97% accuracy and 95% F1-score with around 0.11–0.13M parameters, significantly outperforming baselines while maintaining minimal model complexity. Ablation studies confirm the necessity of both channel attention and multi-task learning, and visualizations show improved intra-class compactness and inter-class separation, highlighting its practical potential for robust, data-efficient UATR in challenging marine environments.

Abstract

Underwater acoustic target recognition (UATR) is of great significance for the protection of marine diversity and national defense security. The development of deep learning provides new opportunities for UATR, but faces challenges brought by the scarcity of reference samples and complex environmental interference. To address these issues, we proposes a multi-task balanced channel attention convolutional neural network (MT-BCA-CNN). The method integrates a channel attention mechanism with a multi-task learning strategy, constructing a shared feature extractor and multi-task classifiers to jointly optimize target classification and feature reconstruction tasks. The channel attention mechanism dynamically enhances discriminative acoustic features such as harmonic structures while suppressing noise. Experiments on the Watkins Marine Life Dataset demonstrate that MT-BCA-CNN achieves 97\% classification accuracy and 95\% -score in 27-class few-shot scenarios, significantly outperforming traditional CNN and ACNN models, as well as popular state-of-the-art UATR methods. Ablation studies confirm the synergistic benefits of multi-task learning and attention mechanisms, while a dynamic weighting adjustment strategy effectively balances task contributions. This work provides an efficient solution for few-shot underwater acoustic recognition, advancing research in marine bioacoustics and sonar signal processing.

Paper Structure

This paper contains 13 sections, 21 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Target Recognition Task Process
  • Figure 2: MT-BCA-CNN Model Architecture
  • Figure 3: Flowchart of multi-task learning implementation, where $\lambda_0$ and $\lambda_1$ denote the weights of the task-specific classifiers, $\ L_1$ and $\ L_2$ represent the task losses, and $L_{total}$ is the joint loss function
  • Figure 4: Examples of raw signal waveforms and their corresponding Mel-spectrogram. (a) Clymene Dolphin-wave. (b) Clymene Dolphin-Mel. (c) Common Dolphin-wave. (d) Common Dolphin-Mel. (e) Beluga White Whale-wave.(f) Beluga White Whale-Mel.
  • Figure 5: (a) CAM++(Acc:0.62). (b) ERes2Net(Acc:0.63). (c) ResNetSE(Acc:0.78). (d) MT-BCA-CNN(Acc:0.97).
  • ...and 2 more figures