Table of Contents
Fetching ...

FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking

Ulas Gunes, Matias Turkulainen, Xuqian Ren, Arno Solin, Juho Kannala, Esa Rahtu

TL;DR

FIORD tackles the limitations of narrow FoV datasets by introducing a high-resolution 360° fisheye dataset captured with dual $200^{\circ}$ lenses, paired with dense $Faro\ Focus\ 3D$ LiDAR ground truth for accurate geometry benchmarking. The dataset comprises ten scenes (five indoor, five outdoor), each with SfM-sparse clouds from COLMAP and dense LiDAR scans, plus rectified images for compatibility with Gaussian Splatting and Nerfacto baselines. A careful calibration, data collection, and alignment pipeline (including CloudCompare-based registration and image rectification) enables robust evaluation of reconstruction and novel view synthesis under challenging conditions such as occlusions and reflections. Baseline experiments show that Gaussian Splatting often yields stronger quantitative metrics than Nerfacto on fisheye data, and that incorporating dense LiDAR data for initialization can improve rendering quality, highlighting FIORD’s potential as a versatile benchmark for future wide-FOV 3D reconstruction methods.

Abstract

The development of large-scale 3D scene reconstruction and novel view synthesis methods mostly rely on datasets comprising perspective images with narrow fields of view (FoV). While effective for small-scale scenes, these datasets require large image sets and extensive structure-from-motion (SfM) processing, limiting scalability. To address this, we introduce a fisheye image dataset tailored for scene reconstruction tasks. Using dual 200-degree fisheye lenses, our dataset provides full 360-degree coverage of 5 indoor and 5 outdoor scenes. Each scene has sparse SfM point clouds and precise LIDAR-derived dense point clouds that can be used as geometric ground-truth, enabling robust benchmarking under challenging conditions such as occlusions and reflections. While the baseline experiments focus on vanilla Gaussian Splatting and NeRF based Nerfacto methods, the dataset supports diverse approaches for scene reconstruction, novel view synthesis, and image-based rendering.

FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking

TL;DR

FIORD tackles the limitations of narrow FoV datasets by introducing a high-resolution 360° fisheye dataset captured with dual lenses, paired with dense LiDAR ground truth for accurate geometry benchmarking. The dataset comprises ten scenes (five indoor, five outdoor), each with SfM-sparse clouds from COLMAP and dense LiDAR scans, plus rectified images for compatibility with Gaussian Splatting and Nerfacto baselines. A careful calibration, data collection, and alignment pipeline (including CloudCompare-based registration and image rectification) enables robust evaluation of reconstruction and novel view synthesis under challenging conditions such as occlusions and reflections. Baseline experiments show that Gaussian Splatting often yields stronger quantitative metrics than Nerfacto on fisheye data, and that incorporating dense LiDAR data for initialization can improve rendering quality, highlighting FIORD’s potential as a versatile benchmark for future wide-FOV 3D reconstruction methods.

Abstract

The development of large-scale 3D scene reconstruction and novel view synthesis methods mostly rely on datasets comprising perspective images with narrow fields of view (FoV). While effective for small-scale scenes, these datasets require large image sets and extensive structure-from-motion (SfM) processing, limiting scalability. To address this, we introduce a fisheye image dataset tailored for scene reconstruction tasks. Using dual 200-degree fisheye lenses, our dataset provides full 360-degree coverage of 5 indoor and 5 outdoor scenes. Each scene has sparse SfM point clouds and precise LIDAR-derived dense point clouds that can be used as geometric ground-truth, enabling robust benchmarking under challenging conditions such as occlusions and reflections. While the baseline experiments focus on vanilla Gaussian Splatting and NeRF based Nerfacto methods, the dataset supports diverse approaches for scene reconstruction, novel view synthesis, and image-based rendering.

Paper Structure

This paper contains 15 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Example data capture setup. The top image shows an example placement of camera in one of our scenes (depicted as its dense point cloud from Faro scanner), while the bottom image illustrates wide-angle 360° photo captured with a single shot of the camera.
  • Figure 2: Sparse and dense point cloud alignment. Fisheye images and LIDAR scans are used to generate and align sparse and dense point clouds. Camera poses can be registered to the aligned model in real-world or COLMAP coordinate system scale.
  • Figure 3: Compact comparison of Ground Truth vs. Gaussian Splatting renders for four representative scenes.
  • Figure 4: Comparison of an image rendering result for the Building_In scene. The ground truth image is compared against the rendering initialized with only COLMAP data versus the fused COLMAP+LIDAR point cloud.