Table of Contents
Fetching ...

SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting

Di Wu, Liu Liu, Xueyu Yuan, Qiaojun Yu, Wenxiao Chen, Ruilong Yan, Yiming Tang, Liangtu Song

TL;DR

The paper tackles the problem of high-fidelity articulated object reconstruction from sparse-view RGB inputs, addressing the cost and practicality of multi-view data. It introduces SPAGS, a framework that combines a Gaussian Information Field for optimal sparse-view perception, planar Gaussian Splatting with a coarse-to-fine optimization, few-shot diffusion refinement, and articulation modeling with part-aware Gaussian primitives to recover accurate part-level surfaces. Empirical results on synthetic and real-world data show SPAGS outperforms state-of-the-art sparse-view and two-state articulated-object methods in surface quality, novel-view synthesis, and joint estimation, while maintaining reasonable training times. The work advances practical 3D reconstruction for manipulation and robotics by reducing data requirements and enabling autonomous view selection and robust part-level representations, with limitations noted for transparent and very small objects and directions for future work on physically-based rendering and super-resolution.

Abstract

Articulated objects are ubiquitous in daily environments, and their 3D reconstruction holds great significance across various fields. However, existing articulated object reconstruction methods typically require costly inputs such as multi-stage and multi-view observations. To address the limitations, we propose a category-agnostic articulated object reconstruction framework via planar Gaussian Splatting, which only uses sparse-view RGB images from a single state. Specifically, we first introduce a Gaussian information field to perceive the optimal sparse viewpoints from candidate camera poses. Then we compress 3D Gaussians into planar Gaussians to facilitate accurate estimation of normal and depth. The planar Gaussians are optimized in a coarse-to-fine manner through depth smooth regularization and few-shot diffusion. Moreover, we introduce a part segmentation probability for each Gaussian primitive and update them by back-projecting part segmentation masks of renderings. Extensive experimental results demonstrate that our method achieves higher-fidelity part-level surface reconstruction on both synthetic and real-world data than existing methods. Codes will be made publicly available.

SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting

TL;DR

The paper tackles the problem of high-fidelity articulated object reconstruction from sparse-view RGB inputs, addressing the cost and practicality of multi-view data. It introduces SPAGS, a framework that combines a Gaussian Information Field for optimal sparse-view perception, planar Gaussian Splatting with a coarse-to-fine optimization, few-shot diffusion refinement, and articulation modeling with part-aware Gaussian primitives to recover accurate part-level surfaces. Empirical results on synthetic and real-world data show SPAGS outperforms state-of-the-art sparse-view and two-state articulated-object methods in surface quality, novel-view synthesis, and joint estimation, while maintaining reasonable training times. The work advances practical 3D reconstruction for manipulation and robotics by reducing data requirements and enabling autonomous view selection and robust part-level representations, with limitations noted for transparent and very small objects and directions for future work on physically-based rendering and super-resolution.

Abstract

Articulated objects are ubiquitous in daily environments, and their 3D reconstruction holds great significance across various fields. However, existing articulated object reconstruction methods typically require costly inputs such as multi-stage and multi-view observations. To address the limitations, we propose a category-agnostic articulated object reconstruction framework via planar Gaussian Splatting, which only uses sparse-view RGB images from a single state. Specifically, we first introduce a Gaussian information field to perceive the optimal sparse viewpoints from candidate camera poses. Then we compress 3D Gaussians into planar Gaussians to facilitate accurate estimation of normal and depth. The planar Gaussians are optimized in a coarse-to-fine manner through depth smooth regularization and few-shot diffusion. Moreover, we introduce a part segmentation probability for each Gaussian primitive and update them by back-projecting part segmentation masks of renderings. Extensive experimental results demonstrate that our method achieves higher-fidelity part-level surface reconstruction on both synthetic and real-world data than existing methods. Codes will be made publicly available.

Paper Structure

This paper contains 16 sections, 14 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Given an arbitrary articulated object, our method enables autonomous optimal sparse viewpoints perception and produces: (1) surface mesh; (2) textured mesh; (3) novel view synthesis; (4) articulated modeling; (5) unseen state generation.
  • Figure 2: The Framework of SPAGS. We use the snowflake symbol to denote frozen network weights and the flame symbol to indicate trainable weights. "Reg." and "Regist." denote regularization and registration respectively. We highlight our main contributions in green. Our method SPAGS can autonomously perceive the optimal sparse viewpoints and achieve high-fidelity reconstruction results for arbitrary articulated objects.
  • Figure 3: Illustration of joint estimation. Note that we use high-resolution rendering to query GPT-4o in actual inference.
  • Figure 4: The qualitative results of whole mesh reconstruction on PartNet-Mobility dataset.
  • Figure 5: The qualitative results of novel view synthesis on PartNet-Mobility dataset.
  • ...and 2 more figures