Table of Contents
Fetching ...

A Multimodal Data Fusion Generative Adversarial Network for Real Time Underwater Sound Speed Field Construction

Wei Huang, Yuqiang Huang, Yanan Wu, Tianhe Xu, Tingting Lyu, Hao Zhang

Abstract

Sound speed profiles (SSPs) are essential parameters underwater that affects the propagation mode of underwater signals and has a critical impact on the energy efficiency of underwater acoustic communication and accuracy of underwater acoustic positioning. Traditionally, SSPs can be obtained by matching field processing (MFP), compressive sensing (CS), and deep learning (DL) methods. However, existing methods mainly rely on on-site underwater sonar observation data, which put forward strict requirements on the deployment of sonar observation systems. To achieve high-precision estimation of sound velocity distribution in a given sea area without on-site underwater data measurement, we propose a multi-modal data-fusion generative adversarial network model with residual attention block (MDF-RAGAN) for SSP construction. To improve the model's ability for capturing global spatial feature correlations, we embedded the attention mechanisms, and use residual modules for deeply capturing small disturbances in the deep ocean sound velocity distribution caused by changes of SST. Experimental results on real open dataset show that the proposed model outperforms other state-of-the-art methods, which achieves an accuracy with an error of less than 0.3m/s. Specifically, MDF-RAGAN not only outperforms convolutional neural network (CNN) and spatial interpolation (SITP) by nearly a factor of two, but also achieves about 65.8\% root mean square error (RMSE) reduction compared to mean profile, which fully reflects the enhancement of overall profile matching by multi-source fusion and cross-modal attention.

A Multimodal Data Fusion Generative Adversarial Network for Real Time Underwater Sound Speed Field Construction

Abstract

Sound speed profiles (SSPs) are essential parameters underwater that affects the propagation mode of underwater signals and has a critical impact on the energy efficiency of underwater acoustic communication and accuracy of underwater acoustic positioning. Traditionally, SSPs can be obtained by matching field processing (MFP), compressive sensing (CS), and deep learning (DL) methods. However, existing methods mainly rely on on-site underwater sonar observation data, which put forward strict requirements on the deployment of sonar observation systems. To achieve high-precision estimation of sound velocity distribution in a given sea area without on-site underwater data measurement, we propose a multi-modal data-fusion generative adversarial network model with residual attention block (MDF-RAGAN) for SSP construction. To improve the model's ability for capturing global spatial feature correlations, we embedded the attention mechanisms, and use residual modules for deeply capturing small disturbances in the deep ocean sound velocity distribution caused by changes of SST. Experimental results on real open dataset show that the proposed model outperforms other state-of-the-art methods, which achieves an accuracy with an error of less than 0.3m/s. Specifically, MDF-RAGAN not only outperforms convolutional neural network (CNN) and spatial interpolation (SITP) by nearly a factor of two, but also achieves about 65.8\% root mean square error (RMSE) reduction compared to mean profile, which fully reflects the enhancement of overall profile matching by multi-source fusion and cross-modal attention.

Paper Structure

This paper contains 26 sections, 28 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Framework of SSP estimation based on the MDF-RAGAN
  • Figure 2: The proposed MDF-RAGAN model for SSP estimation.
  • Figure 3: Comparison of sound speed profile predictions at different locations and depths using models trained with different historical data lengths. Each subfigure compares the predictions of MDF-RAGAN, CNN, SITP, and MEAN using training data of 5, 10, and 17 years.
  • Figure 4: Comparison of sound speed profiles at different locations and depths. Each row represents a specific depth (200 m, 400 m, 1000 m, and 1975 m), while each column represents a specific location (Location 1: ($49.5^{\circ}$S, $13.5^{\circ}$E); Location 2: ($49.5^{\circ}$S, $28.5^{\circ}$E); Location 3: ($52.5^{\circ}$S, $25.5^{\circ}$E)). The figures illustrate the comparison of sound speed profile predictions using four different methods: MDF-RAGAN, CNN, SITP, and MEAN at each location and depth.
  • Figure 5: t-SNE visualization of intermediate features in MDF-RAGAN. (a)-(c) show the intermediate features from the discriminator: (a) neighbor label features, (b) target SST-label features, and (c) target LOC-label features.
  • ...and 1 more figures