Explicit Correspondence Matching for Generalizable Neural Radiance Fields
Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
TL;DR
This paper tackles the challenge of generalizable neural radiance fields that can render novel views from very few inputs without per-scene optimization. It introduces MatchNeRF, which explicitly models cross-view feature correspondence as a geometry prior by using a Transformer-based encoder to align multi-view features and a group-wise cosine similarity computed on projected 2D features to guide the NeRF decoder. The method achieves state-of-the-art results on DTU, Real Forward-Facing, Blender, and Tanks & Temples across 2- and 3-view setups, and demonstrates robustness to reference-view selection and improved depth reconstruction. The approach is notable for its view-agnostic design and its potential to generalize across different 3D representations, offering a practical feed-forward alternative to costly cost-volume-based methods and a foundation for future extensions in occlusion handling and explicit optimization-free 3D reconstruction.
Abstract
We present a new generalizable NeRF method that is able to directly generalize to new unseen scenarios and perform novel view synthesis with as few as two source views. The key to our approach lies in the explicitly modeled correspondence matching information, so as to provide the geometry prior to the prediction of NeRF color and density for volume rendering. The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views, which is able to provide reliable cues about the surface geometry. Unlike previous methods where image features are extracted independently for each view, we consider modeling the cross-view interactions via Transformer cross-attention, which greatly improves the feature matching quality. Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density, demonstrating the effectiveness and superiority of our proposed method. The code and model are on our project page: https://donydchen.github.io/matchnerf
