Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training

Botao Ye; Sifei Liu; Xueting Li; Marc Pollefeys; Ming-Hsuan Yang

Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training

Botao Ye, Sifei Liu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang

TL;DR

The paper tackles the challenge of maintaining 3D-consistent novel-view synthesis from a single image by introducing a training-free 3D epipolar attention mechanism. By locating and importing overlapping information from a reference view along epipolar lines and extending this to multi-view contexts, the approach enhances consistency without retraining a diffusion backbone. Key contributions include a parameter-duplicated epipolar attention block, DDIM-inversion–driven paired features, and a multi-view extension that aggregates information from multiple context views. Experiments on GSO and Objaverse show improved multi-view consistency and downstream 3D reconstruction quality, with a favorable trade-off between performance and memory compared to training-based baselines.

Abstract

Large diffusion models demonstrate remarkable zero-shot capabilities in novel view synthesis from a single image. However, these models often face challenges in maintaining consistency across novel and reference views. A crucial factor leading to this issue is the limited utilization of contextual information from reference views. Specifically, when there is an overlap in the viewing frustum between two views, it is essential to ensure that the corresponding regions maintain consistency in both geometry and appearance. This observation leads to a simple yet effective approach, where we propose to use epipolar geometry to locate and retrieve overlapping information from the input view. This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning, as the process requires no learnable parameters. Furthermore, to enhance the overall consistency of generated views, we extend the utilization of epipolar attention to a multi-view setting, allowing retrieval of overlapping information from the input view and other target views. Qualitative and quantitative experimental results demonstrate the effectiveness of our method in significantly improving the consistency of synthesized views without the need for any fine-tuning. Moreover, This enhancement also boosts the performance of downstream applications such as 3D reconstruction. The code is available at https://github.com/botaoye/ConsisSyn.

Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training

TL;DR

Abstract

Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)