3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

Yihang Luo; Shangchen Zhou; Yushi Lan; Xingang Pan; Chen Change Loy

3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy

TL;DR

3DEnhancer tackles the challenge of low-resolution, view-inconsistent 3D content by introducing a multi-view latent diffusion framework augmented with a pose-aware encoder, view-consistent DiT blocks, and epipolar-guided cross-view mechanisms. The approach leverages a 2D diffusion prior to refine coarse multi-view renders while enforcing cross-view coherence through multi-view row attention and near-view epipolar aggregation, aided by an extensive MV data augmentation pipeline. It supports enhancing outputs from existing MV diffusion models and directly refining coarse 3D representations via 3DGaussians or other reconstructions, yielding superior texture detail and consistency across views, as demonstrated on synthetic Objaverse data and in-the-wild objects with substantial qualitative and quantitative gains. Ablation and user studies confirm the effectiveness of the cross-view modules and augmentations, underscoring the method's potential for robust 3D texture refinement, editing, and reconstruction in practical pipelines.

Abstract

Despite advances in neural rendering, due to the scarcity of high-quality 3D datasets and the inherent limitations of multi-view diffusion models, view synthesis and 3D model generation are restricted to low resolutions with suboptimal multi-view consistency. In this study, we present a novel 3D enhancement pipeline, dubbed 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency. Our method includes a pose-aware encoder and a diffusion-based denoiser to refine low-quality multi-view images, along with data augmentation and a multi-view attention module with epipolar aggregation to maintain consistent, high-quality 3D outputs across views. Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles. Extensive evaluations show that 3DEnhancer significantly outperforms existing methods, boosting both multi-view enhancement and per-instance 3D optimization tasks.

3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

TL;DR

Abstract

3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (18)