Table of Contents
Fetching ...

MTReD: 3D Reconstruction Dataset for Fly-over Videos of Maritime Domain

Rui Yi Yong, Samuel Picosson, Arnold Wiliem

TL;DR

MTReD presents a maritime aerial 3D reconstruction benchmark built from 19 fly-over videos of ships, islands, and coastlines, with ground-truth-free evaluation via Reprojection Error and a novel DiFPS metric based on DINOv2 features to assess scene completion. It benchmarks two baselines, Colmap SfM and MASt3R, and examines pre-processing pipelines (Colmap-based filtering, contrast and background normalization) to improve geometric consistency and perceptual completeness. The key finding is that Colmap emphasizes geometric accuracy while MASt3R delivers denser, more perceptually complete reconstructions; DiFPS correlates better with scene completion than LPIPS. MTReD thus provides a practical, open benchmark and a perceptual metric to drive progress in maritime 3D reconstruction, with dataset and code available at the provided GitHub URL for broader adoption and future integration with NeRF/Gaussian Splatting pipelines.

Abstract

This work tackles 3D scene reconstruction for a video fly-over perspective problem in the maritime domain, with a specific emphasis on geometrically and visually sound reconstructions. This will allow for downstream tasks such as segmentation, navigation, and localization. To our knowledge, there is no dataset available in this domain. As such, we propose a novel maritime 3D scene reconstruction benchmarking dataset, named as MTReD (Maritime Three-Dimensional Reconstruction Dataset). The MTReD comprises 19 fly-over videos curated from the Internet containing ships, islands, and coastlines. As the task is aimed towards geometrical consistency and visual completeness, the dataset uses two metrics: (1) Reprojection error; and (2) Perception based metrics. We find that existing perception-based metrics, such as Learned Perceptual Image Patch Similarity (LPIPS), do not appropriately measure the completeness of a reconstructed image. Thus, we propose a novel semantic similarity metric utilizing DINOv2 features coined DiFPS (DinoV2 Features Perception Similarity). We perform initial evaluation on two baselines: (1) Structured from Motion (SfM) through Colmap; and (2) the recent state-of-the-art MASt3R model. We find that the reconstructed scenes by MASt3R have higher reprojection errors, but superior perception based metric scores. To this end, some pre-processing methods are explored, and we find a pre-processing method which improves both the reprojection error and perception-based score. We envisage our proposed MTReD to stimulate further research in these directions. The dataset and all the code will be made available in https://github.com/RuiYiYong/MTReD.

MTReD: 3D Reconstruction Dataset for Fly-over Videos of Maritime Domain

TL;DR

MTReD presents a maritime aerial 3D reconstruction benchmark built from 19 fly-over videos of ships, islands, and coastlines, with ground-truth-free evaluation via Reprojection Error and a novel DiFPS metric based on DINOv2 features to assess scene completion. It benchmarks two baselines, Colmap SfM and MASt3R, and examines pre-processing pipelines (Colmap-based filtering, contrast and background normalization) to improve geometric consistency and perceptual completeness. The key finding is that Colmap emphasizes geometric accuracy while MASt3R delivers denser, more perceptually complete reconstructions; DiFPS correlates better with scene completion than LPIPS. MTReD thus provides a practical, open benchmark and a perceptual metric to drive progress in maritime 3D reconstruction, with dataset and code available at the provided GitHub URL for broader adoption and future integration with NeRF/Gaussian Splatting pipelines.

Abstract

This work tackles 3D scene reconstruction for a video fly-over perspective problem in the maritime domain, with a specific emphasis on geometrically and visually sound reconstructions. This will allow for downstream tasks such as segmentation, navigation, and localization. To our knowledge, there is no dataset available in this domain. As such, we propose a novel maritime 3D scene reconstruction benchmarking dataset, named as MTReD (Maritime Three-Dimensional Reconstruction Dataset). The MTReD comprises 19 fly-over videos curated from the Internet containing ships, islands, and coastlines. As the task is aimed towards geometrical consistency and visual completeness, the dataset uses two metrics: (1) Reprojection error; and (2) Perception based metrics. We find that existing perception-based metrics, such as Learned Perceptual Image Patch Similarity (LPIPS), do not appropriately measure the completeness of a reconstructed image. Thus, we propose a novel semantic similarity metric utilizing DINOv2 features coined DiFPS (DinoV2 Features Perception Similarity). We perform initial evaluation on two baselines: (1) Structured from Motion (SfM) through Colmap; and (2) the recent state-of-the-art MASt3R model. We find that the reconstructed scenes by MASt3R have higher reprojection errors, but superior perception based metric scores. To this end, some pre-processing methods are explored, and we find a pre-processing method which improves both the reprojection error and perception-based score. We envisage our proposed MTReD to stimulate further research in these directions. The dataset and all the code will be made available in https://github.com/RuiYiYong/MTReD.

Paper Structure

This paper contains 11 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Some video frame examples from the proposed MTReD dataset. The MTReD contains multiple Maritime settings such as coastal area, ships, islands. These settings pose unique challenges for the the general 3D reconstruction problem.
  • Figure 2: A Grounding DINO to SAM pipeline used for background segmentation along with image examples. The collected information on mean and standard deviation for each color channel is also highlighted. This information is then used for image processing.
  • Figure 3: A comparison of reprojected images against original input images. The first table maps the layout of each reconstruction type in the subsequent images. With the exception of the Colmap reprojections, all other reprojections use a MASt3R reconstruction. Red rectangles are used to highlight major reconstruction artifacts in each reprojected image.