A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition
Wei Huang, Shumeng Sun, Junpeng Lu, Zhenpeng Xu, Zhengyang Xiu, Hao Zhang
TL;DR
This work tackles few-shot underwater acoustic target recognition by introducing MT-BCA-CNN, a model that synergistically combines a channel attention mechanism with a multi-task learning objective. The architecture uses a shared, attention-guided feature extractor and two task heads—classification and spectrogram reconstruction—with a dynamic weighting scheme to balance tasks, and a Gaussian-smoothed attention to stabilize channel emphasis. On the Watkins Marine Mammal Sound Database, MT-BCA-CNN achieves 97% accuracy and 95% F1-score with around 0.11–0.13M parameters, significantly outperforming baselines while maintaining minimal model complexity. Ablation studies confirm the necessity of both channel attention and multi-task learning, and visualizations show improved intra-class compactness and inter-class separation, highlighting its practical potential for robust, data-efficient UATR in challenging marine environments.
Abstract
Underwater acoustic target recognition (UATR) is of great significance for the protection of marine diversity and national defense security. The development of deep learning provides new opportunities for UATR, but faces challenges brought by the scarcity of reference samples and complex environmental interference. To address these issues, we proposes a multi-task balanced channel attention convolutional neural network (MT-BCA-CNN). The method integrates a channel attention mechanism with a multi-task learning strategy, constructing a shared feature extractor and multi-task classifiers to jointly optimize target classification and feature reconstruction tasks. The channel attention mechanism dynamically enhances discriminative acoustic features such as harmonic structures while suppressing noise. Experiments on the Watkins Marine Life Dataset demonstrate that MT-BCA-CNN achieves 97\% classification accuracy and 95\% $F1$-score in 27-class few-shot scenarios, significantly outperforming traditional CNN and ACNN models, as well as popular state-of-the-art UATR methods. Ablation studies confirm the synergistic benefits of multi-task learning and attention mechanisms, while a dynamic weighting adjustment strategy effectively balances task contributions. This work provides an efficient solution for few-shot underwater acoustic recognition, advancing research in marine bioacoustics and sonar signal processing.
