MVBoost: Boost 3D Reconstruction with Multi-View Refinement

Xiangyu Liu; Xiaomei Zhang; Zhiyuan Ma; Xiangyu Zhu; Zhen Lei

MVBoost: Boost 3D Reconstruction with Multi-View Refinement

Xiangyu Liu, Xiaomei Zhang, Zhiyuan Ma, Xiangyu Zhu, Zhen Lei

TL;DR

MVBoost addresses the scarcity of high-quality 3D data for single-view reconstruction by fusing a high-accuracy multi-view diffusion model with a consistent 3D reconstruction model to generate pseudo-ground-truth and train a fast, feed-forward reconstructor. It introduces a multi-view refinement strategy to produce refined pseudo-views, and a LoRA-enhanced boosting reconstruction model trained on this data, coupled with input-view optimization to align the final asset with the user’s input view. The approach yields state-of-the-art results on Google Scanned Objects, with strong improvements in both 2D view quality and 3D geometry, and demonstrates generalization through open-world baselines and an OpenLRM boost. This framework enables scalable, diverse 3D asset generation from a single image without requiring large pre-existing 3D datasets, offering practical benefits for game content, AR/VR, and animation pipelines.

Abstract

Recent advancements in 3D object reconstruction have been remarkable, yet most current 3D models rely heavily on existing 3D datasets. The scarcity of diverse 3D datasets results in limited generalization capabilities of 3D reconstruction models. In this paper, we propose a novel framework for boosting 3D reconstruction with multi-view refinement (MVBoost) by generating pseudo-GT data. The key of MVBoost is combining the advantages of the high accuracy of the multi-view generation model and the consistency of the 3D reconstruction model to create a reliable data source. Specifically, given a single-view input image, we employ a multi-view diffusion model to generate multiple views, followed by a large 3D reconstruction model to produce consistent 3D data. MVBoost then adaptively refines these multi-view images, rendered from the consistent 3D data, to build a large-scale multi-view dataset for training a feed-forward 3D reconstruction model. Additionally, the input view optimization is designed to optimize the corresponding viewpoints based on the user's input image, ensuring that the most important viewpoint is accurately tailored to the user's needs. Extensive evaluations demonstrate that our method achieves superior reconstruction results and robust generalization compared to prior works.

MVBoost: Boost 3D Reconstruction with Multi-View Refinement

TL;DR

Abstract

MVBoost: Boost 3D Reconstruction with Multi-View Refinement

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)