Table of Contents
Fetching ...

DEMONet: Underwater Acoustic Target Recognition based on Multi-Expert Network and Cross-Temporal Variational Autoencoder

Yuan Xie, Xiaowei Zhang, Jiawei Ren, Ji Xu

TL;DR

DEMONet is a multi-expert network that allocates various underwater signals to their best-matched expert layer based on DEMON spectra for fine-grained signal processing and introduces a cross-temporal alignment strategy and a variational autoencoder (VAE) to reconstruct noise-resistant DEMON spectra to replace the raw DEMON features.

Abstract

Building a robust underwater acoustic recognition system in real-world scenarios is challenging due to the complex underwater environment and the dynamic motion states of targets. A promising optimization approach is to leverage the intrinsic physical characteristics of targets, which remain invariable regardless of environmental conditions, to provide robust insights. However, our study reveals that while physical characteristics exhibit robust properties, they may lack class-specific discriminative patterns. Consequently, directly incorporating physical characteristics into model training can potentially introduce unintended inductive biases, leading to performance degradation. To utilize the benefits of physical characteristics while mitigating possible detrimental effects, we propose DEMONet in this study, which utilizes the detection of envelope modulation on noise (DEMON) to provide robust insights into the shaft frequency or blade counts of targets. DEMONet is a multi-expert network that allocates various underwater signals to their best-matched expert layer based on DEMON spectra for fine-grained signal processing. Thereinto, DEMON spectra are solely responsible for providing implicit physical characteristics without establishing a mapping relationship with the target category. Furthermore, to mitigate noise and spurious modulation spectra in DEMON features, we introduce a cross-temporal alignment strategy and employ a variational autoencoder (VAE) to reconstruct noise-resistant DEMON spectra to replace the raw DEMON features. The effectiveness of the proposed DEMONet with cross-temporal VAE was primarily evaluated on the DeepShip dataset and our proprietary datasets. Experimental results demonstrated that our approach could achieve state-of-the-art performance on both datasets.

DEMONet: Underwater Acoustic Target Recognition based on Multi-Expert Network and Cross-Temporal Variational Autoencoder

TL;DR

DEMONet is a multi-expert network that allocates various underwater signals to their best-matched expert layer based on DEMON spectra for fine-grained signal processing and introduces a cross-temporal alignment strategy and a variational autoencoder (VAE) to reconstruct noise-resistant DEMON spectra to replace the raw DEMON features.

Abstract

Building a robust underwater acoustic recognition system in real-world scenarios is challenging due to the complex underwater environment and the dynamic motion states of targets. A promising optimization approach is to leverage the intrinsic physical characteristics of targets, which remain invariable regardless of environmental conditions, to provide robust insights. However, our study reveals that while physical characteristics exhibit robust properties, they may lack class-specific discriminative patterns. Consequently, directly incorporating physical characteristics into model training can potentially introduce unintended inductive biases, leading to performance degradation. To utilize the benefits of physical characteristics while mitigating possible detrimental effects, we propose DEMONet in this study, which utilizes the detection of envelope modulation on noise (DEMON) to provide robust insights into the shaft frequency or blade counts of targets. DEMONet is a multi-expert network that allocates various underwater signals to their best-matched expert layer based on DEMON spectra for fine-grained signal processing. Thereinto, DEMON spectra are solely responsible for providing implicit physical characteristics without establishing a mapping relationship with the target category. Furthermore, to mitigate noise and spurious modulation spectra in DEMON features, we introduce a cross-temporal alignment strategy and employ a variational autoencoder (VAE) to reconstruct noise-resistant DEMON spectra to replace the raw DEMON features. The effectiveness of the proposed DEMONet with cross-temporal VAE was primarily evaluated on the DeepShip dataset and our proprietary datasets. Experimental results demonstrated that our approach could achieve state-of-the-art performance on both datasets.

Paper Structure

This paper contains 21 sections, 7 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: The general process of the data acquisition, preprocessing, and feature extraction.
  • Figure 2: The framework and training process of DEMONet with cross-temporal VAE, along with the detailed architectures of individual modules of the model. In Figure (a), the orange box signifies the active block, while the gray box indicates that the block is not activated, or the block only performs feedforward without gradient calculation and parameter update.
  • Figure 3: Preliminary Experiments on selecting backbone model architectures and input features.
  • Figure 4: A comparison between the raw 1-D DMEON spectra (left) and the reconstructed 1-D DEMON spectra using cross-temporal VAE (right).
  • Figure 5: The routing assignment for DEMONet on the DeepShip training set. The horizontal axis scale represents the ID of the expert layer, while the vertical axis scale represents the target type. Each grid cell displays the proportion of the number of targets assigned to a certain expert layer to the total number of targets of that specific category.