Generalizable Novel-View Synthesis using a Stereo Camera
Haechan Lee, Wonjoon Jin, Seung-Hwan Baek, Sunghyun Cho
TL;DR
This work tackles the challenge of generalizable novel-view synthesis by leveraging stereo-camera inputs. It presents StereoNeRF, a NeRF-based framework that integrates stereo matching through a stereo feature extractor, depth-guided plane sweeping, and a stereo depth loss to produce high-fidelity geometry and rendering without per-scene optimization. A new StereoNVS dataset containing real and synthetic stereo imagery supports training and evaluation, enabling robust comparisons against prior generalizable methods. Empirical results show StereoNeRF outperforms baselines in both image quality and depth accuracy, particularly in textureless regions and thin structures, demonstrating the practical impact of incorporating stereo geometry into generalizable view synthesis.
Abstract
In this paper, we propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images. Since recent stereo matching has demonstrated accurate geometry prediction, we introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction. To this end, this paper proposes a novel framework, dubbed StereoNeRF, which integrates stereo matching into a NeRF-based generalizable view synthesis approach. StereoNeRF is equipped with three key components to effectively exploit stereo matching in novel-view synthesis: a stereo feature extractor, a depth-guided plane-sweeping, and a stereo depth loss. Moreover, we propose the StereoNVS dataset, the first multi-view dataset of stereo-camera images, encompassing a wide variety of both real and synthetic scenes. Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis.
