Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts

Yuan Xie; Jiawei Ren; Ji Xu

Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts

Yuan Xie, Jiawei Ren, Ji Xu

TL;DR

This work tackles underwater acoustic target recognition under strong intra-class diversity and inter-class similarity by introducing a convolution-based mixture of experts (CMoE). The model uses multiple independent expert layers with a routing layer to adaptively assign inputs, complemented by balancing regularization and an optional residual module (RCMoE) to mitigate overfitting. It leverages diverse acoustic features (STFT, Mel, Bark, CQT) and a ResNet-AP backbone, and evaluates on Shipsear, DTIL, and DeepShip with careful train-test splits. Experimental results show consistent accuracy gains over baselines, supported by visualization analyses that reveal interpretable routing behavior linked to target characteristics such as size.

Abstract

Underwater acoustic target recognition is a difficult task owing to the intricate nature of underwater acoustic signals. The complex underwater environments, unpredictable transmission channels, and dynamic motion states greatly impact the real-world underwater acoustic signals, and may even obscure the intrinsic characteristics related to targets. Consequently, the data distribution of underwater acoustic signals exhibits high intra-class diversity, thereby compromising the accuracy and robustness of recognition systems.To address these issues, this work proposes a convolution-based mixture of experts (CMoE) that recognizes underwater targets in a fine-grained manner. The proposed technique introduces multiple expert layers as independent learners, along with a routing layer that determines the assignment of experts according to the characteristics of inputs. This design allows the model to utilize independent parameter spaces, facilitating the learning of complex underwater signals with high intra-class diversity. Furthermore, this work optimizes the CMoE structure by balancing regularization and an optional residual module. To validate the efficacy of our proposed techniques, we conducted detailed experiments and visualization analyses on three underwater acoustic databases across several acoustic features. The experimental results demonstrate that our CMoE consistently achieves significant performance improvements, delivering superior recognition accuracy when compared to existing advanced methods.

Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts

TL;DR

Abstract

Paper Structure (24 sections, 8 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 8 equations, 8 figures, 6 tables, 1 algorithm.

Introduction
Background
Motivation
Our Work
Related Works
Underwater Acoustic Target Recognition
Mixture of Experts
Methodology
Acoustic Feature Extraction
Front-end Backbone Network
Expert Layer, Routing Layer, and Residual Module
Balancing Regularization
Experiment Setup
Datasets
Effective Frequency Bands
...and 9 more sections

Figures (8)

Figure 1: Spectrograms of several samples in the Shipsear dataset. Motorboat_33 records the start and stop of the motorboat "Dud"; Motorboat_39 records the arrival of the motorboat "Dud"; Motorboat_79 records the passing of the motorboat "Zodiac"; Passenger ship_59 records "Marde Mouro" sailing towards the port with considerable speed; Sailboat_56 records a sailboat passing in a very close distance; Fishboat_76 records the fish boat passing.
Figure 2: The model structure of the front-end backbone model - ResNet with attention pooling. "Conv" represents the convolutional layer, and "BN" represents the batch normalization layer.
Figure 3: The overall process of our proposed CMoE, including routing probability calculation, expert assignment, and optional residual module.
Figure 4: Preliminary experiments on the selection of effective frequency bands, frame lengths, and front-end backbone models. The frame shift is set to half the frame length by default.
Figure 5: The confusion matrix heat maps of the baseline model and CMoE on Shipsear. Both models take the STFT spectrogram as the input feature.
...and 3 more figures

Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts

TL;DR

Abstract

Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts

Authors

TL;DR

Abstract

Table of Contents

Figures (8)