Table of Contents
Fetching ...

Generalizable Novel-View Synthesis using a Stereo Camera

Haechan Lee, Wonjoon Jin, Seung-Hwan Baek, Sunghyun Cho

TL;DR

This work tackles the challenge of generalizable novel-view synthesis by leveraging stereo-camera inputs. It presents StereoNeRF, a NeRF-based framework that integrates stereo matching through a stereo feature extractor, depth-guided plane sweeping, and a stereo depth loss to produce high-fidelity geometry and rendering without per-scene optimization. A new StereoNVS dataset containing real and synthetic stereo imagery supports training and evaluation, enabling robust comparisons against prior generalizable methods. Empirical results show StereoNeRF outperforms baselines in both image quality and depth accuracy, particularly in textureless regions and thin structures, demonstrating the practical impact of incorporating stereo geometry into generalizable view synthesis.

Abstract

In this paper, we propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images. Since recent stereo matching has demonstrated accurate geometry prediction, we introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction. To this end, this paper proposes a novel framework, dubbed StereoNeRF, which integrates stereo matching into a NeRF-based generalizable view synthesis approach. StereoNeRF is equipped with three key components to effectively exploit stereo matching in novel-view synthesis: a stereo feature extractor, a depth-guided plane-sweeping, and a stereo depth loss. Moreover, we propose the StereoNVS dataset, the first multi-view dataset of stereo-camera images, encompassing a wide variety of both real and synthetic scenes. Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis.

Generalizable Novel-View Synthesis using a Stereo Camera

TL;DR

This work tackles the challenge of generalizable novel-view synthesis by leveraging stereo-camera inputs. It presents StereoNeRF, a NeRF-based framework that integrates stereo matching through a stereo feature extractor, depth-guided plane sweeping, and a stereo depth loss to produce high-fidelity geometry and rendering without per-scene optimization. A new StereoNVS dataset containing real and synthetic stereo imagery supports training and evaluation, enabling robust comparisons against prior generalizable methods. Empirical results show StereoNeRF outperforms baselines in both image quality and depth accuracy, particularly in textureless regions and thin structures, demonstrating the practical impact of incorporating stereo geometry into generalizable view synthesis.

Abstract

In this paper, we propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images. Since recent stereo matching has demonstrated accurate geometry prediction, we introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction. To this end, this paper proposes a novel framework, dubbed StereoNeRF, which integrates stereo matching into a NeRF-based generalizable view synthesis approach. StereoNeRF is equipped with three key components to effectively exploit stereo matching in novel-view synthesis: a stereo feature extractor, a depth-guided plane-sweeping, and a stereo depth loss. Moreover, we propose the StereoNVS dataset, the first multi-view dataset of stereo-camera images, encompassing a wide variety of both real and synthetic scenes. Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis.
Paper Structure (26 sections, 3 equations, 8 figures, 5 tables)

This paper contains 26 sections, 3 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Novel-view synthesis results of a baseline method johari2022geonerf and ours. The baseline shows degraded performances, even trained using stereo-camera images (b). In contrast, fully exploiting stereo-camera images, our method shows superior results (c).
  • Figure 2: Usefulness of exploiting binocular stereo. Comparison on depth estimation between a learning-based MVS method peng2022rethinking and a learning-based binocular stereo method xu2023unifying.
  • Figure 3: Overview of StereoNeRF. StereoNeRF consists of a shared feature extractor and a neural renderer. We design the feature extractor with a stereo estimation network, a stereo feature extractor, and a MVS network. StereoNeRF takes $N$ pairs of stereo images and a camera pose as inputs, and synthesizes a novel-view image of the camera pose.
  • Figure 4: Stereo feature extractor of StereoNeRF, where an pre-trained stereo estimation network is incorporated.
  • Figure 5: Stereo attention module used in the stereo feature extractor. We exploit the rich features from the pre-trained stereo estimation module.
  • ...and 3 more figures