Table of Contents
Fetching ...

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang

TL;DR

This paper proposes an Entangled View-Epipolar Information Aggregation method dubbed EVE-NeRF, which effectively mitigates the potential lack of inherent geometric and appearance constraints resulting from one-dimensional interactions, thus further boosting the 3D representation generalizability.

Abstract

Generalizable NeRF can directly synthesize novel views across new scenes, eliminating the need for scene-specific retraining in vanilla NeRF. A critical enabling factor in these approaches is the extraction of a generalizable 3D representation by aggregating source-view features. In this paper, we propose an Entangled View-Epipolar Information Aggregation method dubbed EVE-NeRF. Different from existing methods that consider cross-view and along-epipolar information independently, EVE-NeRF conducts the view-epipolar feature aggregation in an entangled manner by injecting the scene-invariant appearance continuity and geometry consistency priors to the aggregation process. Our approach effectively mitigates the potential lack of inherent geometric and appearance constraint resulting from one-dimensional interactions, thus further boosting the 3D representation generalizablity. EVE-NeRF attains state-of-the-art performance across various evaluation scenarios. Extensive experiments demonstate that, compared to prevailing single-dimensional aggregation, the entangled network excels in the accuracy of 3D scene geometry and appearance reconstruction. Our code is publicly available at https://github.com/tatakai1/EVENeRF.

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

TL;DR

This paper proposes an Entangled View-Epipolar Information Aggregation method dubbed EVE-NeRF, which effectively mitigates the potential lack of inherent geometric and appearance constraints resulting from one-dimensional interactions, thus further boosting the 3D representation generalizability.

Abstract

Generalizable NeRF can directly synthesize novel views across new scenes, eliminating the need for scene-specific retraining in vanilla NeRF. A critical enabling factor in these approaches is the extraction of a generalizable 3D representation by aggregating source-view features. In this paper, we propose an Entangled View-Epipolar Information Aggregation method dubbed EVE-NeRF. Different from existing methods that consider cross-view and along-epipolar information independently, EVE-NeRF conducts the view-epipolar feature aggregation in an entangled manner by injecting the scene-invariant appearance continuity and geometry consistency priors to the aggregation process. Our approach effectively mitigates the potential lack of inherent geometric and appearance constraint resulting from one-dimensional interactions, thus further boosting the 3D representation generalizablity. EVE-NeRF attains state-of-the-art performance across various evaluation scenarios. Extensive experiments demonstate that, compared to prevailing single-dimensional aggregation, the entangled network excels in the accuracy of 3D scene geometry and appearance reconstruction. Our code is publicly available at https://github.com/tatakai1/EVENeRF.
Paper Structure (27 sections, 13 equations, 13 figures, 6 tables, 3 algorithms)

This paper contains 27 sections, 13 equations, 13 figures, 6 tables, 3 algorithms.

Figures (13)

  • Figure 1: Given the sampling points along a target ray that are re-projected on the epipolar lines in each source view, existing approaches suhail2022generalizablevarma2022attention employ attention mechanism to aggregate the cross-view features for each sampling point and perform epipolar aggregation of sampling points along the epipolar lines within individual views, either sequentially or circularly. However, our investigation reveals the limitations in existing strategies: exclusively aggregating cross-view information results in rendering artifacts, stemming from the absence of appearance continuity between adjacent depth provided by epipolar cues. Conversely, relying solely on epipolar information leads to depth map discontinuities due to the absence of geometry consistency across multiple views. Our proposed EVE-NeRF harnesses both cross-view and along-epipolar information in an entangled manner and effectively addresses the above issues.
  • Figure 2: Pipline of EVE-NeRF. 1) We first employ a lightweight CNN to extract features of the epipolar sampling points from source views. 2) Through the Entangled View-Epipolar Information Aggregation, we complementarily enable information interaction in both the view and epipolar dimensions to produce generalizable multi-view epipolar features. 3) We use the NeRF Decoder to obtain color and density for the sampling points and predict the target color based on volume rendering.
  • Figure 3: The along-epipolar perception provides appearance continuity prior through adjacent-depth attention along the ray, while the multi-view calibration offers geometry consistency prior via cross-view attention. Our proposed method significantly reduces artifacts in rendering new views compared to single-dimension transformers.
  • Figure 4: Qualitative comparison of EVE-NeRF with IBRNetwang2021ibrnet and GNTvarma2022attention in setting 1. The first, second, and third rows correspond to the Fern scene from LLFF, the Mic scene from Blender, and the Crest scene from Shiny, respectively. Our method, EVE-NeRF, demonstrates superior capability compared to the baselines in accurately reconstructing the geometry, appearance, and complex texture regions. In particular, our method successfully reconstructs the leaves and the surrounding area in the Fern scene.
  • Figure 5: Results for setting 2. Our method (EVE-NeRF) is trained on DTU and the Google Scanned Object dataset with 3 reference views. Our method outperforms on multiple metrics with other few-shot generalizable neural rendering methods.
  • ...and 8 more figures