Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention
Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Wenhan Luo, Wenping Wang, Yike Guo
TL;DR
This work tackles the inefficiencies of extending Large Reconstruction Models to multi-view inputs by introducing M-LRM, a 3D-aware transformer framework. It integrates geometry-aware positional embeddings (GaPE) and geometry-aware cross attention (GCA) to inject explicit 3D priors and enforce cross-view coherence, initializing triplane tokens with geometry-informed priors. The approach yields higher-fidelity 3D geometries, faster convergence, and improved robustness over prior methods such as Instant3D and LGM, demonstrated on multi-view and single-view generation tasks. This advancement enables more reliable 3D reconstruction from sparse view sets and enhances practical applicability in 3D content generation and related domains.
Abstract
Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the strong 3D coherence among the input images. In this paper, we propose a Multi-view Large Reconstruction Model (M-LRM) designed to reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner. Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images. Moreover, we employ the 3D priors of the input multi-view images to initialize the triplane tokens. Compared to previous methods, the proposed M-LRM can generate 3D shapes of high fidelity. Experimental studies demonstrate that our model achieves a significant performance gain and faster training convergence. Project page: \url{https://murphylmf.github.io/M-LRM/}.
