An Attention-Assisted Multi-Modal Data Fusion Model for Real-Time Estimation of Underwater Sound Velocity
Pengfei Wu, Wei Huang, Yujie Shi, Hao Zhang
TL;DR
This work tackles real-time estimation of underwater sound velocity by introducing SA-MDF-CNN, which fuses remote sensing SST data with historical SSP patterns represented by EOF components and spatial coordinates. By combining a CNN-based local feature extractor with a multi-head self-attention mechanism, the model captures both local and global dependencies across multimodal inputs to predict SSP across depth. Empirical results show SA-MDF-CNN achieves lower RMSE than CNN, SITP, and mean-value baselines, with interpretable attention weights that emphasize shallow-water variability; ocean experiments in the South China Sea corroborate robustness in non-uniform grids. The method offers a practical, real-time alternative to on-site SSP measurements, with potential for enhanced underwater PNTC and communication systems, while acknowledging limitations regarding internal-wave effects and outlining future work to address them.
Abstract
The estimation of underwater sound velocity distribution serves as a critical basis for facilitating effective underwater communication and precise positioning, given that variations in sound velocity influence the path of signal transmission. Conventional techniques for the direct measurement of sound velocity, as well as methods that involve the inversion of sound velocity utilizing acoustic field data, necessitate on--site data collection. This requirement not only places high demands on device deployment, but also presents challenges in achieving real-time estimation of sound velocity distribution. In order to construct a real-time sound velocity field and eliminate the need for underwater onsite data measurement operations, we propose a self-attention embedded multimodal data fusion convolutional neural network (SA-MDF-CNN) for real-time underwater sound speed profile (SSP) estimation. The proposed model seeks to elucidate the inherent relationship between remote sensing sea surface temperature (SST) data, the primary component characteristics of historical SSPs, and their spatial coordinates. This is achieved by employing CNNs and attention mechanisms to extract local and global correlations from the input data, respectively. The ultimate objective is to facilitate a rapid and precise estimation of sound velocity distribution within a specified task area. Experimental results show that the method proposed in this paper has lower root mean square error (RMSE) and stronger robustness than other state-of-the-art methods.
