GeoFusionLRM: Geometry-Aware Self-Correction for Consistent 3D Reconstruction
Ahmet Burak Yildirim, Tuna Saygin, Duygu Ceylan, Aysegul Dundar
TL;DR
GeoFusionLRM addresses geometric inconsistencies in single-image 3D reconstruction by introducing a geometry-aware self-correction framework. It adds a GeoFormer encoder that processes depth and normal maps from intermediate reconstructions and a GeoFuser that fuses geometry-aware tokens with semantic image features to refine subsequent reconstruction passes. The method unrolls three refinement steps during training and uses existing losses, achieving state-of-the-art or superior normal-map fidelity on OmniObject3D and Google Scanned Objects datasets. This approach meaningfully improves mesh- conditioning alignment with input views, enabling sharper geometry and better-detail preservation without external supervision, at the cost of increased inference time due to refinement passes.
Abstract
Single-image 3D reconstruction with large reconstruction models (LRMs) has advanced rapidly, yet reconstructions often exhibit geometric inconsistencies and misaligned details that limit fidelity. We introduce GeoFusionLRM, a geometry-aware self-correction framework that leverages the model's own normal and depth predictions to refine structural accuracy. Unlike prior approaches that rely solely on features extracted from the input image, GeoFusionLRM feeds back geometric cues through a dedicated transformer and fusion module, enabling the model to correct errors and enforce consistency with the conditioning image. This design improves the alignment between the reconstructed mesh and the input views without additional supervision or external signals. Extensive experiments demonstrate that GeoFusionLRM achieves sharper geometry, more consistent normals, and higher fidelity than state-of-the-art LRM baselines.
