MTReD: 3D Reconstruction Dataset for Fly-over Videos of Maritime Domain
Rui Yi Yong, Samuel Picosson, Arnold Wiliem
TL;DR
MTReD presents a maritime aerial 3D reconstruction benchmark built from 19 fly-over videos of ships, islands, and coastlines, with ground-truth-free evaluation via Reprojection Error and a novel DiFPS metric based on DINOv2 features to assess scene completion. It benchmarks two baselines, Colmap SfM and MASt3R, and examines pre-processing pipelines (Colmap-based filtering, contrast and background normalization) to improve geometric consistency and perceptual completeness. The key finding is that Colmap emphasizes geometric accuracy while MASt3R delivers denser, more perceptually complete reconstructions; DiFPS correlates better with scene completion than LPIPS. MTReD thus provides a practical, open benchmark and a perceptual metric to drive progress in maritime 3D reconstruction, with dataset and code available at the provided GitHub URL for broader adoption and future integration with NeRF/Gaussian Splatting pipelines.
Abstract
This work tackles 3D scene reconstruction for a video fly-over perspective problem in the maritime domain, with a specific emphasis on geometrically and visually sound reconstructions. This will allow for downstream tasks such as segmentation, navigation, and localization. To our knowledge, there is no dataset available in this domain. As such, we propose a novel maritime 3D scene reconstruction benchmarking dataset, named as MTReD (Maritime Three-Dimensional Reconstruction Dataset). The MTReD comprises 19 fly-over videos curated from the Internet containing ships, islands, and coastlines. As the task is aimed towards geometrical consistency and visual completeness, the dataset uses two metrics: (1) Reprojection error; and (2) Perception based metrics. We find that existing perception-based metrics, such as Learned Perceptual Image Patch Similarity (LPIPS), do not appropriately measure the completeness of a reconstructed image. Thus, we propose a novel semantic similarity metric utilizing DINOv2 features coined DiFPS (DinoV2 Features Perception Similarity). We perform initial evaluation on two baselines: (1) Structured from Motion (SfM) through Colmap; and (2) the recent state-of-the-art MASt3R model. We find that the reconstructed scenes by MASt3R have higher reprojection errors, but superior perception based metric scores. To this end, some pre-processing methods are explored, and we find a pre-processing method which improves both the reprojection error and perception-based score. We envisage our proposed MTReD to stimulate further research in these directions. The dataset and all the code will be made available in https://github.com/RuiYiYong/MTReD.
