DMSORT: An efficient parallel maritime multi-object tracking architecture for unmanned vessel platforms
Shengyu Tang, Zeyuan Lu, Jiazhi Dong, Changdong Yu, Xiaoyu Wang, Yaohui Lyu, Weihao Xia
TL;DR
DMSORT tackles the challenging problem of maritime multi-object tracking under strong ego-motion, occlusion, and scale variation by proposing a parallel, motion-aware framework. The approach fuses four innovations: a Dual-Branch Detection–Tracking Architecture (DDTA), a Reversible Columnar Detection Network (RCDN) for scalable, lossless feature refinement, a lightweight Transformer-based appearance encoder (Li-TAE), and a clustering-optimized feature fusion (COFF) for robust motion–appearance integration. Empirical results on the Singapore Maritime Dataset show state-of-the-art performance with $HOTA=76.69\%$, $IDF1=83.67\%$, and $MOTA=77.60\%$, while maintaining real-time inference; DMSORT also demonstrates strong cross-domain generalization to MOT17. The work presents a practical, resource-efficient solution for USV and maritime surveillance, with potential for multi-camera and radar–visual extensions to further enhance robustness in all-weather maritime environments.
Abstract
Accurate perception of the marine environment through robust multi-object tracking (MOT) is essential for ensuring safe vessel navigation and effective maritime surveillance. However, the complicated maritime environment often causes camera motion and subsequent visual degradation, posing significant challenges to MOT. To address this challenge, we propose an efficient Dual-branch Maritime SORT (DMSORT) method for maritime MOT. The core of the framework is a parallel tracker with affine compensation, which incorporates an object detection and re-identification (ReID) branch, along with a dedicated branch for dynamic camera motion estimation. Specifically, a Reversible Columnar Detection Network (RCDN) is integrated into the detection module to leverage multi-level visual features for robust object detection. Furthermore, a lightweight Transformer-based appearance extractor (Li-TAE) is designed to capture global contextual information and generate robust appearance features. Another branch decouples platform-induced and target-intrinsic motion by constructing a projective transformation, applying platform-motion compensation within the Kalman filter, and thereby stabilizing true object trajectories. Finally, a clustering-optimized feature fusion module effectively combines motion and appearance cues to ensure identity consistency under noise, occlusion, and drift. Extensive evaluations on the Singapore Maritime Dataset demonstrate that DMSORT achieves state-of-the-art performance. Notably, DMSORT attains the fastest runtime among existing ReID-based MOT frameworks while maintaining high identity consistency and robustness to jitter and occlusion. Code is available at: https://github.com/BiscuitsLzy/DMSORT-An-efficient-parallel-maritime-multi-object-tracking-architecture-.
