Real-time Ship Recognition and Georeferencing for the Improvement of Maritime Situational Awareness
Borja Carrillo Perez
TL;DR
This work addresses real-time ship recognition and georeferencing to enhance maritime situational awareness. It introduces ShipSG, a real-world dataset with ship masks and geographic positions, and develops ScatYOLOv8+CBAM—an embedded-optimized, real-time segmentation architecture that fuses a 2D scattering transform with attention mechanisms. The approach achieves a high mAP around $mAP \, \approx \, 75.46$ with frame-times near $25.3$ ms on the NVIDIA Jetson AGX Xavier, and introduces a slicing strategy that improves small-ship detection by about $8$–$11\%$. A monocular georeferencing method based on image homographies yields positioning errors of approximately $18\,m$ within $400\,m$ and $44\,m$ between $400$ and $1200\,m$, enabling real-time visualization on maps and integration with other maritime data streams. Overall, the work demonstrates the viability of deep-learning-based ship recognition and georeferencing on embedded hardware, establishing ShipSG as a benchmark and offering a practical, scalable framework for maritime monitoring and decision support.
Abstract
In an era where maritime infrastructures are crucial, advanced situational awareness solutions are increasingly important. The use of optical camera systems can allow real-time usage of maritime footage. This thesis presents an investigation into leveraging deep learning and computer vision to advance real-time ship recognition and georeferencing for the improvement of maritime situational awareness. A novel dataset, ShipSG, is introduced, containing 3,505 images and 11,625 ship masks with corresponding class and geographic position. After an exploration of state-of-the-art, a custom real-time segmentation architecture, ScatYOLOv8+CBAM, is designed for the NVIDIA Jetson AGX Xavier embedded system. This architecture adds the 2D scattering transform and attention mechanisms to YOLOv8, achieving an mAP of 75.46% and an 25.3 ms per frame, outperforming state-of-the-art methods by over 5%. To improve small and distant ship recognition in high-resolution images on embedded systems, an enhanced slicing mechanism is introduced, improving mAP by 8% to 11%. Additionally, a georeferencing method is proposed, achieving positioning errors of 18 m for ships up to 400 m away and 44 m for ships between 400 m and 1200 m. The findings are also applied in real-world scenarios, such as the detection of abnormal ship behaviour, camera integrity assessment and 3D reconstruction. The approach of this thesis outperforms existing methods and provides a framework for integrating recognized and georeferenced ships into real-time systems, enhancing operational effectiveness and decision-making for maritime stakeholders. This thesis contributes to the maritime computer vision field by establishing a benchmark for ship segmentation and georeferencing research, demonstrating the viability of deep-learning-based recognition and georeferencing methods for real-time maritime monitoring.
