Table of Contents
Fetching ...

Graph Learning-Driven Multi-Vessel Association: Fusing Multimodal Data for Maritime Intelligence

Yuxu Lu, Kaisen Yang, Dong Yang, Haifeng Ding, Jinxian Weng, Ryan Wen Liu

TL;DR

GMvA tackles the challenging problem of associating vessel observations across multimodal streams (AIS and CCTV) in crowded waterways. It introduces a dynamic-graph framework with a Temporal Graph Attention layer and a Spatial-Temporal Attention block to learn robust spatiotemporal representations, followed by an MLP-based uncertainty fusion and Hungarian-based matching to enforce global consistency. The method achieves superior accuracy and robustness across varying vessel densities and data missingness, demonstrating the ability to learn optimal features without hand-crafted cues. The work advances maritime surveillance by enabling more reliable real-time tracking and post-hoc evidence gathering, with potential extensions to include radar data for further resilience in occluded or challenging conditions.

Abstract

Ensuring maritime safety and optimizing traffic management in increasingly crowded and complex waterways require effective waterway monitoring. However, current methods struggle with challenges arising from multimodal data, such as dimensional disparities, mismatched target counts, vessel scale variations, occlusions, and asynchronous data streams from systems like the automatic identification system (AIS) and closed-circuit television (CCTV). Traditional multi-target association methods often struggle with these complexities, particularly in densely trafficked waterways. To overcome these issues, we propose a graph learning-driven multi-vessel association (GMvA) method tailored for maritime multimodal data fusion. By integrating AIS and CCTV data, GMvA leverages time series learning and graph neural networks to capture the spatiotemporal features of vessel trajectories effectively. To enhance feature representation, the proposed method incorporates temporal graph attention and spatiotemporal attention, effectively capturing both local and global vessel interactions. Furthermore, a multi-layer perceptron-based uncertainty fusion module computes robust similarity scores, and the Hungarian algorithm is adopted to ensure globally consistent and accurate target matching. Extensive experiments on real-world maritime datasets confirm that GMvA delivers superior accuracy and robustness in multi-target association, outperforming existing methods even in challenging scenarios with high vessel density and incomplete or unevenly distributed AIS and CCTV data.

Graph Learning-Driven Multi-Vessel Association: Fusing Multimodal Data for Maritime Intelligence

TL;DR

GMvA tackles the challenging problem of associating vessel observations across multimodal streams (AIS and CCTV) in crowded waterways. It introduces a dynamic-graph framework with a Temporal Graph Attention layer and a Spatial-Temporal Attention block to learn robust spatiotemporal representations, followed by an MLP-based uncertainty fusion and Hungarian-based matching to enforce global consistency. The method achieves superior accuracy and robustness across varying vessel densities and data missingness, demonstrating the ability to learn optimal features without hand-crafted cues. The work advances maritime surveillance by enabling more reliable real-time tracking and post-hoc evidence gathering, with potential extensions to include radar data for further resilience in occluded or challenging conditions.

Abstract

Ensuring maritime safety and optimizing traffic management in increasingly crowded and complex waterways require effective waterway monitoring. However, current methods struggle with challenges arising from multimodal data, such as dimensional disparities, mismatched target counts, vessel scale variations, occlusions, and asynchronous data streams from systems like the automatic identification system (AIS) and closed-circuit television (CCTV). Traditional multi-target association methods often struggle with these complexities, particularly in densely trafficked waterways. To overcome these issues, we propose a graph learning-driven multi-vessel association (GMvA) method tailored for maritime multimodal data fusion. By integrating AIS and CCTV data, GMvA leverages time series learning and graph neural networks to capture the spatiotemporal features of vessel trajectories effectively. To enhance feature representation, the proposed method incorporates temporal graph attention and spatiotemporal attention, effectively capturing both local and global vessel interactions. Furthermore, a multi-layer perceptron-based uncertainty fusion module computes robust similarity scores, and the Hungarian algorithm is adopted to ensure globally consistent and accurate target matching. Extensive experiments on real-world maritime datasets confirm that GMvA delivers superior accuracy and robustness in multi-target association, outperforming existing methods even in challenging scenarios with high vessel density and incomplete or unevenly distributed AIS and CCTV data.

Paper Structure

This paper contains 36 sections, 22 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of maritime intelligent transportation systems (MITS) for complex waterways. The integration of AIS data and video enables reliable multi-target association, ensuring accurate vessel tracking, identity verification, and behavior monitoring. Our method maintains seamless surveillance continuity, even in the presence of system failures or data inconsistencies.
  • Figure 2: Overview of the proposed GMvA framework. At each timestamp, multimodal trajectories are structured into temporal graphs. High-dimensional node features are extracted via TGA layer and STA block, with feature normalization applied to enhance representation. After independently processing two distinct data streams, an MLP-NMF computes similarity scores between matching pairs, generating a cross-class similarity matrix. The Hungarian algorithm is then used to derive optimal matches from the matrix.
  • Figure 3: The pipeline of the proposed TGA layer and STA block. The TGA layer processes dynamic temporal graphs using a graph attention mechanism. The STA block performs three key operations: spatial feature extraction, temporal feature extraction, and feature fusion through a feed-forward network.
  • Figure 4: The pipeline of feature fusion and similarity computation. The STA block-generated features are then fused through a dedicated MLP-UMF, which computes similarity scores between potential matches by seamlessly incorporating both appearance and geometric constraints.
  • Figure 5: To unify the coordinate systems of AIS and video trajectories, we use a linear regression model (LRM) for coordinate transformation. The unified coordinate systems (i.e., (b) and (d)) have similar trajectory spatial distributions. GMvA focuses more on the relative spatiotemporal features of multiple targets to reduce the accuracy loss caused by coordinate conversion errors.
  • ...and 5 more figures