Table of Contents
Fetching ...

Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts

Yuan Xie, Jiawei Ren, Junfeng Li, Ji Xu

TL;DR

The results substantiate that M3 has the ability to outperform the most advanced single-task recognition models, thereby achieving the state-of-the-art performance.

Abstract

Underwater acoustic target recognition has emerged as a prominent research area within the field of underwater acoustics. However, the current availability of authentic underwater acoustic signal recordings remains limited, which hinders data-driven acoustic recognition models from learning robust patterns of targets from a limited set of intricate underwater signals, thereby compromising their stability in practical applications. To overcome these limitations, this study proposes a recognition framework called M3 (Multi-task, Multi-gate, Multi-expert) to enhance the model's ability to capture robust patterns by making it aware of the inherent properties of targets. In this framework, an auxiliary task that focuses on target properties, such as estimating target size, is designed. The auxiliary task then shares parameters with the recognition task to realize multi-task learning. This paradigm allows the model to concentrate on shared information across tasks and identify robust patterns of targets in a regularized manner, thereby enhancing the model's generalization ability. Moreover, M3 incorporates multi-expert and multi-gate mechanisms, allowing for the allocation of distinct parameter spaces to various underwater signals. This enables the model to process intricate signal patterns in a fine-grained and differentiated manner. To evaluate the effectiveness of M3, extensive experiments were implemented on the ShipsEar underwater ship-radiated noise dataset. The results substantiate that M3 has the ability to outperform the most advanced single-task recognition models, thereby achieving the state-of-the-art performance.

Advancing Robust Underwater Acoustic Target Recognition through Multi-task Learning and Multi-Gate Mixture-of-Experts

TL;DR

The results substantiate that M3 has the ability to outperform the most advanced single-task recognition models, thereby achieving the state-of-the-art performance.

Abstract

Underwater acoustic target recognition has emerged as a prominent research area within the field of underwater acoustics. However, the current availability of authentic underwater acoustic signal recordings remains limited, which hinders data-driven acoustic recognition models from learning robust patterns of targets from a limited set of intricate underwater signals, thereby compromising their stability in practical applications. To overcome these limitations, this study proposes a recognition framework called M3 (Multi-task, Multi-gate, Multi-expert) to enhance the model's ability to capture robust patterns by making it aware of the inherent properties of targets. In this framework, an auxiliary task that focuses on target properties, such as estimating target size, is designed. The auxiliary task then shares parameters with the recognition task to realize multi-task learning. This paradigm allows the model to concentrate on shared information across tasks and identify robust patterns of targets in a regularized manner, thereby enhancing the model's generalization ability. Moreover, M3 incorporates multi-expert and multi-gate mechanisms, allowing for the allocation of distinct parameter spaces to various underwater signals. This enables the model to process intricate signal patterns in a fine-grained and differentiated manner. To evaluate the effectiveness of M3, extensive experiments were implemented on the ShipsEar underwater ship-radiated noise dataset. The results substantiate that M3 has the ability to outperform the most advanced single-task recognition models, thereby achieving the state-of-the-art performance.

Paper Structure

This paper contains 19 sections, 3 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The overall process of data collecting and feature extraction. The extracted acoustic features consist of the 2-D main feature -- log power spectrogram, along with three candidate 1-D gating features. The dimensions of the extracted features are also presented in the figure.
  • Figure 2: The distribution of three candidate gating features: (a) Welch spectrum, (b) average amplitude spectrum, (c) spectral centroid, on the ShipsEar dataset. The visualization results were derived using t-SNE and k-means clustering. This figure also provides the parameter setups for t-SNE and k-means algorithms, along with the associated values of clustering inertia and Silhouette score.
  • Figure 3: An overview of the structure of (a) fundamental MTL model, (b) M3, (c) M3-TSE, (d) detailed architecture components. In this figure, yellow represents the relevant components of the recognition task, red represents the relevant components of the target size estimation task, and blue represents the shared parts between the two tasks.
  • Figure 4: The results of the preliminary experiments conducted to select the optimal main feature and model backbone. The values of error bars represent the unbiased standard deviation between the results of two runs conducted with different random seeds. "spec" is the abbreviation for "spectrogram".
  • Figure 5: The results of the preliminary experiments conducted to select the optimal gating features for both tasks. The values of error bars represent the unbiased standard deviation between the results of two runs conducted with different random seeds. The scale label on the horizontal axis represents the gating feature for the main task & auxiliary task. For example, "Avg-amp spec & Welch spec" indicates that the average amplitude spectrum serves as the gating feature for the main task, while the Welch spectrum serves as the gating feature for the auxiliary task.