Table of Contents
Fetching ...

PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis

Sohyeong Kim, Martin Danelljan, Radu Timofte, Luc Van Gool, Jean-Philippe Thiran

TL;DR

The paper introduces PIV3CAMS, a multi-camera RGB-D dataset (8,385 image pairs and 82 video pairs) collected with three cameras across Zurich and Cheonan to support tasks such as image/video enhancement, view interpolation, and novel view synthesis. It provides a detailed setup for synchronized data capture, calibration, and data processing, and demonstrates depth-enabled novel view synthesis by adapting a state-of-the-art baseline and evaluating depth-augmented variants on synthetic shapes, KITTI real scenes, and PIV3CAMS itself. The results show that depth information improves novel view synthesis for small view changes and that ground-truth depth can upper-bound performance, while dense depth and reliable visibility masks are critical for quality in real-world data. The work positions PIV3CAMS as a valuable resource for depth-aware, multi-camera learning and suggests future directions in depth completion, object-level annotations, and broader scene diversity to advance cross-sensor vision applications.

Abstract

The modern approaches for computer vision tasks significantly rely on machine learning, which requires a large number of quality images. While there is a plethora of image datasets with a single type of images, there is a lack of datasets collected from multiple cameras. In this thesis, we introduce Paired Image and Video data from three CAMeraS, namely PIV3CAMS, aimed at multiple computer vision tasks. The PIV3CAMS dataset consists of 8385 pairs of images and 82 pairs of videos taken from three different cameras: Canon D5 Mark IV, Huawei P20, and ZED stereo camera. The dataset includes various indoor and outdoor scenes from different locations in Zurich (Switzerland) and Cheonan (South Korea). Some of the computer vision applications that can benefit from the PIV3CAMS dataset are image/video enhancement, view interpolation, image matching, and much more. We provide a careful explanation of the data collection process and detailed analysis of the data. The second part of this thesis studies the usage of depth information in the view synthesizing task. In addition to the regeneration of a current state-of-the-art algorithm, we investigate several proposed alternative models that integrate depth information geometrically. Through extensive experiments, we show that the effect of depth is crucial in small view changes. Finally, we apply our model to the introduced PIV3CAMS dataset to synthesize novel target views as an example application of PIV3CAMS.

PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis

TL;DR

The paper introduces PIV3CAMS, a multi-camera RGB-D dataset (8,385 image pairs and 82 video pairs) collected with three cameras across Zurich and Cheonan to support tasks such as image/video enhancement, view interpolation, and novel view synthesis. It provides a detailed setup for synchronized data capture, calibration, and data processing, and demonstrates depth-enabled novel view synthesis by adapting a state-of-the-art baseline and evaluating depth-augmented variants on synthetic shapes, KITTI real scenes, and PIV3CAMS itself. The results show that depth information improves novel view synthesis for small view changes and that ground-truth depth can upper-bound performance, while dense depth and reliable visibility masks are critical for quality in real-world data. The work positions PIV3CAMS as a valuable resource for depth-aware, multi-camera learning and suggests future directions in depth completion, object-level annotations, and broader scene diversity to advance cross-sensor vision applications.

Abstract

The modern approaches for computer vision tasks significantly rely on machine learning, which requires a large number of quality images. While there is a plethora of image datasets with a single type of images, there is a lack of datasets collected from multiple cameras. In this thesis, we introduce Paired Image and Video data from three CAMeraS, namely PIV3CAMS, aimed at multiple computer vision tasks. The PIV3CAMS dataset consists of 8385 pairs of images and 82 pairs of videos taken from three different cameras: Canon D5 Mark IV, Huawei P20, and ZED stereo camera. The dataset includes various indoor and outdoor scenes from different locations in Zurich (Switzerland) and Cheonan (South Korea). Some of the computer vision applications that can benefit from the PIV3CAMS dataset are image/video enhancement, view interpolation, image matching, and much more. We provide a careful explanation of the data collection process and detailed analysis of the data. The second part of this thesis studies the usage of depth information in the view synthesizing task. In addition to the regeneration of a current state-of-the-art algorithm, we investigate several proposed alternative models that integrate depth information geometrically. Through extensive experiments, we show that the effect of depth is crucial in small view changes. Finally, we apply our model to the introduced PIV3CAMS dataset to synthesize novel target views as an example application of PIV3CAMS.
Paper Structure (57 sections, 14 equations, 29 figures, 2 tables)

This paper contains 57 sections, 14 equations, 29 figures, 2 tables.

Figures (29)

  • Figure 1: Examples from PIV3CAMS dataset The upper row shows images from different cameras with relative resolution sizes(Canon: 6720x4480, P20: 5120x3480, and ZED:2208x1242). The lower row shows the enlarged RGB-D image and a confidence map from ZED stereo camera.
  • Figure 2: Novel view Synthesis task The goal of this task is to synthesize the target view from the source view.
  • Figure 3: The samples from PIV3CAMS image dataset
  • Figure 4: The frame samples from PIV3CAMS video dataset
  • Figure 5: Rig setup for cameras. (Left) Front view of the 3D printed rig with cameras mounted on it. (Right) Side view of the rig with cameras mounted on it.
  • ...and 24 more figures