ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang, Yan Zuo, Sameera Ramasinghe, Loris Bazzani, Gil Avraham, Anton van den Hengel
TL;DR
ViewFusion addresses multi-view inconsistency in diffusion-based novel-view synthesis by introducing a training-free auto-regressive framework that leverages previously generated views via Interpolated Denoising. It extends pre-trained single-view diffusion models to multi-view conditioning through a Noise Interpolation Module and a principled weighting scheme that emphasizes near views while maintaining information from earlier conditions. Empirical results on ABO and GSO show improved multi-view consistency and 3D reconstruction quality with competitive image metrics, without finetuning or architectural changes to the base diffusion models. This approach enables scalable, plug-in enhancement of existing diffusion pipelines for robust multi-view synthesis and downstream 3D tasks.
Abstract
Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images. Yet, the independent process of image generation in these prevailing methods leads to challenges in maintaining multiple-view consistency. To address this, we introduce ViewFusion, a novel, training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models. Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for the next view generation, ensuring robust multi-view consistency during the novel-view generation process. Through a diffusion process that fuses known-view information via interpolated denoising, our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning. Extensive experimental results demonstrate the effectiveness of ViewFusion in generating consistent and detailed novel views.
