Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks
Ruihan Xu, Anthony Opipari, Joshua Mah, Stanley Lewis, Haoran Zhang, Hanzhe Guo, Odest Chadwicke Jenkins
TL;DR
The paper addresses the challenge of fast, reliable single-view 3D reconstruction with explicit geometry suitable for robotics. It introduces SO(2)-Equivariant Gaussian Sculpting Networks (GSNs) that deform a canonical Gaussian cube into a target object by predicting per-Gaussian deviations for position, scale, rotation, color, and opacity, all decoded from a ResNet-based encoder and trained with a multi-view rendering loss plus Extended Chamfer Distance to enforce equivariance. Key contributions include real-time performance (>150 FPS), competitive reconstruction quality against diffusion-based baselines, and a demonstration of object-centric grasp planning in a robotic pipeline. The work highlights the practical potential of explicit Gaussian splatting with equivariant priors for fast perceptual understanding in cluttered robotic applications, while noting limitations such as generalization beyond ShapeNet and future directions toward scene-level reconstructions and richer datasets.
Abstract
This paper introduces SO(2)-Equivariant Gaussian Sculpting Networks (GSNs) as an approach for SO(2)-Equivariant 3D object reconstruction from single-view image observations. GSNs take a single observation as input to generate a Gaussian splat representation describing the observed object's geometry and texture. By using a shared feature extractor before decoding Gaussian colors, covariances, positions, and opacities, GSNs achieve extremely high throughput (>150FPS). Experiments demonstrate that GSNs can be trained efficiently using a multi-view rendering loss and are competitive, in quality, with expensive diffusion-based reconstruction algorithms. The GSN model is validated on multiple benchmark experiments. Moreover, we demonstrate the potential for GSNs to be used within a robotic manipulation pipeline for object-centric grasping.
