AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views
Junwei Zhou, Yu-Wing Tai
TL;DR
AmodalGen3D addresses amodal 3D reconstruction from sparse, unposed views by integrating 2D amodal priors with partial 3D geometry through a dual-attention framework. The View-Wise Cross Attention aggregates multi-view completions while the Stereo-Conditioned Cross Attention leverages partial MVS geometry with a geometry-guided gating mechanism to infer unseen structure. The approach is trained with a synthetic object-centric data engine using conditional flow matching and validated across synthetic and real datasets, showing improved fidelity, completeness, and cross-view consistency over baselines. This work enables robust object-level 3D reconstruction under occlusion-heavy and sparse-view conditions, with broad implications for robotics, AR/VR, and embodied AI. Overall, AmodalGen3D demonstrates that combining strong 2D priors with geometry-aware 3D generation yields coherent, occlusion-free 3D objects even when large regions remain unobserved.
Abstract
Reconstructing 3D objects from a few unposed and partially occluded views is a common yet challenging problem in real-world scenarios, where many object surfaces are never directly observed. Traditional multi-view or inpainting-based approaches struggle under such conditions, often yielding incomplete or geometrically inconsistent reconstructions. We introduce AmodalGen3D, a generative framework for amodal 3D object reconstruction that infers complete, occlusion-free geometry and appearance from arbitrary sparse inputs. The model integrates 2D amodal completion priors with multi-view stereo geometry conditioning, supported by a View-Wise Cross Attention mechanism for sparse-view feature fusion and a Stereo-Conditioned Cross Attention module for unobserved structure inference. By jointly modeling visible and hidden regions, AmodalGen3D faithfully reconstructs 3D objects that are consistent with sparse-view constraints while plausibly hallucinating unseen parts. Experiments on both synthetic and real-world datasets demonstrate that AmodalGen3D achieves superior fidelity and completeness under occlusion-heavy sparse-view settings, addressing a pressing need for object-level 3D scene reconstruction in robotics, AR/VR, and embodied AI applications.
