Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction
Dongxu Wei, Zhiqi Li, Peidong Liu
TL;DR
Omni-Scene introduces Omni-Gaussian representation to unify pixel-based and volume-based Gaussians for ego-centric sparse-view reconstruction. The Volume Builder (Triplane Transformer + Volume Decoder) and Pixel Decorator (Multi-View U-Net + Pixel Decoder) are designed to produce complementary Gaussian fields, which are fused via Projection-Based Feature Fusion and Depth-Guided Training Decomposition to form full Omni-Gaussians for novel-view rendering. Empirical results show substantial gains over pixelSplat and MVSplat on ego-centric reconstruction and competitive performance on RealEstate10K, with strong ablations confirming the value of cross-representation collaboration, depth initialization, and efficient 3D feature encoding. The approach enables fast, high-fidelity 3D scene reconstruction from single-frame surround views and supports multi-modal 3D scene generation when integrated with diffusion-based 2D models, broadening practical applications in autonomous driving and 3D content creation.
Abstract
Prior works employing pixel-based Gaussian representation have demonstrated efficacy in feed-forward sparse-view reconstruction. However, such representation necessitates cross-view overlap for accurate depth estimation, and is challenged by object occlusions and frustum truncations. As a result, these methods require scene-centric data acquisition to maintain cross-view overlap and complete scene visibility to circumvent occlusions and truncations, which limits their applicability to scene-centric reconstruction. In contrast, in autonomous driving scenarios, a more practical paradigm is ego-centric reconstruction, which is characterized by minimal cross-view overlap and frequent occlusions and truncations. The limitations of pixel-based representation thus hinder the utility of prior works in this task. In light of this, this paper conducts an in-depth analysis of different representations, and introduces Omni-Gaussian representation with tailored network design to complement their strengths and mitigate their drawbacks. Experiments show that our method significantly surpasses state-of-the-art methods, pixelSplat and MVSplat, in ego-centric reconstruction, and achieves comparable performance to prior works in scene-centric reconstruction.
