Table of Contents
Fetching ...

GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models

Hao Sun, Lei Fan, Donglin Di, Shaohui Liu

TL;DR

GAOT tackles the gap in text-conditioned articulated object generation by introducing a three-phase pipeline that maps text prompts to point clouds, refines them with hypergraph learning to yield part-based graphs, and uses diffusion to synthesize joints for a complete articulated object.The method leverages a fine-tuned Point-E for initial geometry, a Hypergraph Neural Network to capture complex part connectivity, and a DDPM-style diffusion process to produce robust joint structures, ultimately rendering fully articulated 3D objects.Evaluations on the PartNet-Mobility dataset show GAOT outperforms prior approaches in key metrics and produces more complete and realistic geometries and joint motions, with ablations confirming the value of hypergraph refinement and the designed loss terms.Overall, GAOT demonstrates a scalable, text-driven pathway to generate controllable articulated objects, enabling applications in robotics, digital twins, and virtual environments, and points to future improvements in detail and diversity.

Abstract

Articulated object generation has seen increasing advancements, yet existing models often lack the ability to be conditioned on text prompts. To address the significant gap between textual descriptions and 3D articulated object representations, we propose GAOT, a three-phase framework that generates articulated objects from text prompts, leveraging diffusion models and hypergraph learning in a three-step process. First, we fine-tune a point cloud generation model to produce a coarse representation of objects from text prompts. Given the inherent connection between articulated objects and graph structures, we design a hypergraph-based learning method to refine these coarse representations, representing object parts as graph vertices. Finally, leveraging a diffusion model, the joints of articulated objects-represented as graph edges-are generated based on the object parts. Extensive qualitative and quantitative experiments on the PartNet-Mobility dataset demonstrate the effectiveness of our approach, achieving superior performance over previous methods.

GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models

TL;DR

GAOT tackles the gap in text-conditioned articulated object generation by introducing a three-phase pipeline that maps text prompts to point clouds, refines them with hypergraph learning to yield part-based graphs, and uses diffusion to synthesize joints for a complete articulated object.The method leverages a fine-tuned Point-E for initial geometry, a Hypergraph Neural Network to capture complex part connectivity, and a DDPM-style diffusion process to produce robust joint structures, ultimately rendering fully articulated 3D objects.Evaluations on the PartNet-Mobility dataset show GAOT outperforms prior approaches in key metrics and produces more complete and realistic geometries and joint motions, with ablations confirming the value of hypergraph refinement and the designed loss terms.Overall, GAOT demonstrates a scalable, text-driven pathway to generate controllable articulated objects, enabling applications in robotics, digital twins, and virtual environments, and points to future improvements in detail and diversity.

Abstract

Articulated object generation has seen increasing advancements, yet existing models often lack the ability to be conditioned on text prompts. To address the significant gap between textual descriptions and 3D articulated object representations, we propose GAOT, a three-phase framework that generates articulated objects from text prompts, leveraging diffusion models and hypergraph learning in a three-step process. First, we fine-tune a point cloud generation model to produce a coarse representation of objects from text prompts. Given the inherent connection between articulated objects and graph structures, we design a hypergraph-based learning method to refine these coarse representations, representing object parts as graph vertices. Finally, leveraging a diffusion model, the joints of articulated objects-represented as graph edges-are generated based on the object parts. Extensive qualitative and quantitative experiments on the PartNet-Mobility dataset demonstrate the effectiveness of our approach, achieving superior performance over previous methods.

Paper Structure

This paper contains 13 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An articulated object consists of several parts and joints. The joints connect different parts and allow them to move. As the joints in most real-world articulated objects are screws with at most one prismatic translation and one revolute rotation, they can be represented by a graph. The lower part of the figure is the overall flow of our proposed approach.
  • Figure 2: Our framework. Firstly, we generate the point cloud model via a fine-tuned point-E model. The point cloud is refined by a hypergraph constructed from the dataset and then transferred to a vertex matrix with an MLP. Secondly, the models are represented as a graph, in which the parts of objects are transferred to graph vertices and the joints are transferred to graph edges. The graph vertices from the previous stage and random noise edges in matrix form are diffused to obtain complete edges. Finally, the graph is rendered to articulated objects.
  • Figure 3: We benchmark our approach against previous methods from the same text prompt. Qualitative comparison shows that our approach can generate objects with complete geometry and rational joint movement. Objects generated using our method are more refined in the details.
  • Figure 4: The articulated object generation process. First, point clouds with colored structural information are generated. Then, through hypergraph part extraction, we obtain parts represented as graph vertices. Using a diffusion model, we generate joints and render the articulated objects.
  • Figure 5: Several failure cases in terms of both generation quality and controllability.