Table of Contents
Fetching ...

DMSORT: An efficient parallel maritime multi-object tracking architecture for unmanned vessel platforms

Shengyu Tang, Zeyuan Lu, Jiazhi Dong, Changdong Yu, Xiaoyu Wang, Yaohui Lyu, Weihao Xia

TL;DR

DMSORT tackles the challenging problem of maritime multi-object tracking under strong ego-motion, occlusion, and scale variation by proposing a parallel, motion-aware framework. The approach fuses four innovations: a Dual-Branch Detection–Tracking Architecture (DDTA), a Reversible Columnar Detection Network (RCDN) for scalable, lossless feature refinement, a lightweight Transformer-based appearance encoder (Li-TAE), and a clustering-optimized feature fusion (COFF) for robust motion–appearance integration. Empirical results on the Singapore Maritime Dataset show state-of-the-art performance with $HOTA=76.69\%$, $IDF1=83.67\%$, and $MOTA=77.60\%$, while maintaining real-time inference; DMSORT also demonstrates strong cross-domain generalization to MOT17. The work presents a practical, resource-efficient solution for USV and maritime surveillance, with potential for multi-camera and radar–visual extensions to further enhance robustness in all-weather maritime environments.

Abstract

Accurate perception of the marine environment through robust multi-object tracking (MOT) is essential for ensuring safe vessel navigation and effective maritime surveillance. However, the complicated maritime environment often causes camera motion and subsequent visual degradation, posing significant challenges to MOT. To address this challenge, we propose an efficient Dual-branch Maritime SORT (DMSORT) method for maritime MOT. The core of the framework is a parallel tracker with affine compensation, which incorporates an object detection and re-identification (ReID) branch, along with a dedicated branch for dynamic camera motion estimation. Specifically, a Reversible Columnar Detection Network (RCDN) is integrated into the detection module to leverage multi-level visual features for robust object detection. Furthermore, a lightweight Transformer-based appearance extractor (Li-TAE) is designed to capture global contextual information and generate robust appearance features. Another branch decouples platform-induced and target-intrinsic motion by constructing a projective transformation, applying platform-motion compensation within the Kalman filter, and thereby stabilizing true object trajectories. Finally, a clustering-optimized feature fusion module effectively combines motion and appearance cues to ensure identity consistency under noise, occlusion, and drift. Extensive evaluations on the Singapore Maritime Dataset demonstrate that DMSORT achieves state-of-the-art performance. Notably, DMSORT attains the fastest runtime among existing ReID-based MOT frameworks while maintaining high identity consistency and robustness to jitter and occlusion. Code is available at: https://github.com/BiscuitsLzy/DMSORT-An-efficient-parallel-maritime-multi-object-tracking-architecture-.

DMSORT: An efficient parallel maritime multi-object tracking architecture for unmanned vessel platforms

TL;DR

DMSORT tackles the challenging problem of maritime multi-object tracking under strong ego-motion, occlusion, and scale variation by proposing a parallel, motion-aware framework. The approach fuses four innovations: a Dual-Branch Detection–Tracking Architecture (DDTA), a Reversible Columnar Detection Network (RCDN) for scalable, lossless feature refinement, a lightweight Transformer-based appearance encoder (Li-TAE), and a clustering-optimized feature fusion (COFF) for robust motion–appearance integration. Empirical results on the Singapore Maritime Dataset show state-of-the-art performance with , , and , while maintaining real-time inference; DMSORT also demonstrates strong cross-domain generalization to MOT17. The work presents a practical, resource-efficient solution for USV and maritime surveillance, with potential for multi-camera and radar–visual extensions to further enhance robustness in all-weather maritime environments.

Abstract

Accurate perception of the marine environment through robust multi-object tracking (MOT) is essential for ensuring safe vessel navigation and effective maritime surveillance. However, the complicated maritime environment often causes camera motion and subsequent visual degradation, posing significant challenges to MOT. To address this challenge, we propose an efficient Dual-branch Maritime SORT (DMSORT) method for maritime MOT. The core of the framework is a parallel tracker with affine compensation, which incorporates an object detection and re-identification (ReID) branch, along with a dedicated branch for dynamic camera motion estimation. Specifically, a Reversible Columnar Detection Network (RCDN) is integrated into the detection module to leverage multi-level visual features for robust object detection. Furthermore, a lightweight Transformer-based appearance extractor (Li-TAE) is designed to capture global contextual information and generate robust appearance features. Another branch decouples platform-induced and target-intrinsic motion by constructing a projective transformation, applying platform-motion compensation within the Kalman filter, and thereby stabilizing true object trajectories. Finally, a clustering-optimized feature fusion module effectively combines motion and appearance cues to ensure identity consistency under noise, occlusion, and drift. Extensive evaluations on the Singapore Maritime Dataset demonstrate that DMSORT achieves state-of-the-art performance. Notably, DMSORT attains the fastest runtime among existing ReID-based MOT frameworks while maintaining high identity consistency and robustness to jitter and occlusion. Code is available at: https://github.com/BiscuitsLzy/DMSORT-An-efficient-parallel-maritime-multi-object-tracking-architecture-.

Paper Structure

This paper contains 20 sections, 16 equations, 14 figures, 4 tables, 1 algorithm.

Figures (14)

  • Figure 1: Schematic diagram of unmanned vessel platform structure.
  • Figure 2: Illustrative examples of typical maritime scenes. These images showcase diverse oceanic scenarios, including vessels of various sizes, a floating buoy, and dynamic boat movements with wakes. Such scenes reflect the complex environments in maritime surveillance, posing challenges for object detection and tracking.
  • Figure 3: Overall pipeline of the proposed DMSORT framework. The system includes online tracking, where DDTA combines detection, motion compensation, and appearance encoding for two-stage association and tracklet management, and an optional post-processing step, where tracklet interpolation refines fragmented trajectories into final tracks.
  • Figure 4: Architecture of the proposed RCDN. The left-bottom part illustrates the reversible column backbone, where each level exchanges features across scales for lossless extraction. The middle modules (SPPF, C3k2, C2PSA) perform multi-scale fusion and attention refinement. The right detection head outputs classification and regression for final object prediction.
  • Figure 5: Architecture of the proposed Li-TAE module. The figure illustrates the overall pipeline and detailed structure of the lightweight Transformer-based appearance encoder, including the use of self-attention with our designed positional embeddings and shrink attention.
  • ...and 9 more figures