Table of Contents
Fetching ...

GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects

Licheng Shen, Saining Zhang, Honghan Li, Peilin Yang, Zihao Huang, Zongzheng Zhang, Hao Zhao

TL;DR

GaussianArt addresses the challenge of reconstructing articulated objects by unifying geometry and motion within articulated 3D Gaussian primitives. It jointly optimizes shape and articulation, enabling consistent cross-state reasoning and scalability to objects with up to 20 parts. The method introduces a soft-to-hard training paradigm, a rigorous initialization via Art-SAM, and the MPArt-90 benchmark to evaluate scalability and generalization; it achieves state-of-the-art accuracy on geometry and part-motion estimation. The results demonstrate strong applicability to downstream tasks such as robotic manipulation and human-scene interaction modeling, highlighting its potential for scalable digital twins.

Abstract

Reconstructing articulated objects is essential for building digital twins of interactive environments. However, prior methods typically decouple geometry and motion by first reconstructing object shape in distinct states and then estimating articulation through post-hoc alignment. This separation complicates the reconstruction pipeline and restricts scalability, especially for objects with complex, multi-part articulation. We introduce a unified representation that jointly models geometry and motion using articulated 3D Gaussians. This formulation improves robustness in motion decomposition and supports articulated objects with up to 20 parts, significantly outperforming prior approaches that often struggle beyond 2--3 parts due to brittle initialization. To systematically assess scalability and generalization, we propose MPArt-90, a new benchmark consisting of 90 articulated objects across 20 categories, each with diverse part counts and motion configurations. Extensive experiments show that our method consistently achieves superior accuracy in part-level geometry reconstruction and motion estimation across a broad range of object types. We further demonstrate applicability to downstream tasks such as robotic simulation and human-scene interaction modeling, highlighting the potential of unified articulated representations in scalable physical modeling.

GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects

TL;DR

GaussianArt addresses the challenge of reconstructing articulated objects by unifying geometry and motion within articulated 3D Gaussian primitives. It jointly optimizes shape and articulation, enabling consistent cross-state reasoning and scalability to objects with up to 20 parts. The method introduces a soft-to-hard training paradigm, a rigorous initialization via Art-SAM, and the MPArt-90 benchmark to evaluate scalability and generalization; it achieves state-of-the-art accuracy on geometry and part-motion estimation. The results demonstrate strong applicability to downstream tasks such as robotic manipulation and human-scene interaction modeling, highlighting its potential for scalable digital twins.

Abstract

Reconstructing articulated objects is essential for building digital twins of interactive environments. However, prior methods typically decouple geometry and motion by first reconstructing object shape in distinct states and then estimating articulation through post-hoc alignment. This separation complicates the reconstruction pipeline and restricts scalability, especially for objects with complex, multi-part articulation. We introduce a unified representation that jointly models geometry and motion using articulated 3D Gaussians. This formulation improves robustness in motion decomposition and supports articulated objects with up to 20 parts, significantly outperforming prior approaches that often struggle beyond 2--3 parts due to brittle initialization. To systematically assess scalability and generalization, we propose MPArt-90, a new benchmark consisting of 90 articulated objects across 20 categories, each with diverse part counts and motion configurations. Extensive experiments show that our method consistently achieves superior accuracy in part-level geometry reconstruction and motion estimation across a broad range of object types. We further demonstrate applicability to downstream tasks such as robotic simulation and human-scene interaction modeling, highlighting the potential of unified articulated representations in scalable physical modeling.

Paper Structure

This paper contains 35 sections, 22 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: ArtGS adopts a separate pipeline, while GaussianArt introduces a unified representation that jointly models geometry and motion. Compared to ArtGS, which struggles with wrong part assignments and axis errors beyond 2–3 parts, our method robustly handles complex objects with many parts. In terms of quality, ArtGS produces wrong segmentation and high errors (CD = 1.98 / 120.15 for static/dynamic), while GaussianArt achieves correct segmentation and much lower errors (CD = 0.94 / 0.16). These results highlight the scalability and accuracy benefits of our unified design.
  • Figure 2: The overview of GaussianArt. We first design a pipeline to generate multi-view-consistent part segmentation masks, which are used to initialize Gaussians in the canonical state. During training, we introduce a unified framework that jointly learns part segmentation and motion using Gaussians. This process employs a soft-to-hard motion optimization strategy, supervised by RGB-D data and part segmentation masks, along with additional refinement techniques (see \ref{['sec:4.3']}). Finally, the mesh and motion parameters produced by GaussianArt can be effectively applied to robotic simulation.
  • Figure 3: Regularization during training. (a) $L_0$ regularization: By using $L_0$ regularization, erroneously assigned Gaussians can be progressively corrected. (b) Trajectory regularization: The point transformed by $\mathbf{p}$ using the estimated flow is constrained to approximate the matched point, facilitating the efficient optimization of motion parameters.
  • Figure 4: MPArt-90 benchmark. Unlike prior datasets that contain fewer than 20 objects and quickly saturate, MPArt-90 scales articulated object reconstruction to 90 objects across 20 diverse categories. Each object provides multi-view RGBD observations together with ground-truth motion parameters, covering configurations with up to 20 parts. This scale reveals failure cases of prior methods, which often collapse beyond 2–3 parts due to brittle initialization. By offering a large and physically grounded benchmark, MPArt-90 enables systematic evaluation of scalability and generalization in articulated modeling.
  • Figure 5: Qualitative results on multi-part objects of MPArt-90.
  • ...and 10 more figures