Table of Contents
Fetching ...

Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation

Wenqing Wang, Yun Fu

TL;DR

This paper advances text-to-3D generation by integrating LLM-driven prompt refinement, diffusion-prior guided Gaussian Splatting, and a continuum mechanics–based deformation map to produce high-quality 3D objects with physics-grounded motion. By representing objects with time-aware 3D Gaussians and employing both 3D shape and 2D appearance guidance, the method mitigates geometry artifacts and enhances visual fidelity. The motion realism is achieved through an MPM-based deformation framework that enforces mass and momentum conservation while deforming Gaussian kernels. The proposed pipeline demonstrates superior qualitative and quantitative performance against relevant baselines and highlights the potential for physics-aware 3D content creation in VR, gaming, and film production.

Abstract

Text-to-3D generation is a valuable technology in virtual reality and digital content creation. While recent works have pushed the boundaries of text-to-3D generation, producing high-fidelity 3D objects with inefficient prompts and simulating their physics-grounded motion accurately still remain unsolved challenges. To address these challenges, we present an innovative framework that utilizes the Large Language Model (LLM)-refined prompts and diffusion priors-guided Gaussian Splatting (GS) for generating 3D models with accurate appearances and geometric structures. We also incorporate a continuum mechanics-based deformation map and color regularization to synthesize vivid physics-grounded motion for the generated 3D Gaussians, adhering to the conservation of mass and momentum. By integrating text-to-3D generation with physics-grounded motion synthesis, our framework renders photo-realistic 3D objects that exhibit physics-aware motion, accurately reflecting the behaviors of the objects under various forces and constraints across different materials. Extensive experiments demonstrate that our approach achieves high-quality 3D generations with realistic physics-grounded motion.

Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation

TL;DR

This paper advances text-to-3D generation by integrating LLM-driven prompt refinement, diffusion-prior guided Gaussian Splatting, and a continuum mechanics–based deformation map to produce high-quality 3D objects with physics-grounded motion. By representing objects with time-aware 3D Gaussians and employing both 3D shape and 2D appearance guidance, the method mitigates geometry artifacts and enhances visual fidelity. The motion realism is achieved through an MPM-based deformation framework that enforces mass and momentum conservation while deforming Gaussian kernels. The proposed pipeline demonstrates superior qualitative and quantitative performance against relevant baselines and highlights the potential for physics-aware 3D content creation in VR, gaming, and film production.

Abstract

Text-to-3D generation is a valuable technology in virtual reality and digital content creation. While recent works have pushed the boundaries of text-to-3D generation, producing high-fidelity 3D objects with inefficient prompts and simulating their physics-grounded motion accurately still remain unsolved challenges. To address these challenges, we present an innovative framework that utilizes the Large Language Model (LLM)-refined prompts and diffusion priors-guided Gaussian Splatting (GS) for generating 3D models with accurate appearances and geometric structures. We also incorporate a continuum mechanics-based deformation map and color regularization to synthesize vivid physics-grounded motion for the generated 3D Gaussians, adhering to the conservation of mass and momentum. By integrating text-to-3D generation with physics-grounded motion synthesis, our framework renders photo-realistic 3D objects that exhibit physics-aware motion, accurately reflecting the behaviors of the objects under various forces and constraints across different materials. Extensive experiments demonstrate that our approach achieves high-quality 3D generations with realistic physics-grounded motion.

Paper Structure

This paper contains 28 sections, 20 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Our framework is a text-to-3D physics-grounded motion-rendering pipeline with high-quality visual appearances and realistic motion.
  • Figure 2: Pipeline overview. Our framework first leverages an LLM to refine the text prompt. Next, it employs a 3D geometry diffusion prior and a 2D image diffusion prior for guiding the 3D GS process, producing the high-quality 3D object. Finally, a deformation map based on continuum mechanics is applied to synthesize the physics-grounded motion of the 3D object.
  • Figure 3: LLM-prompt refinement of a vague text prompt.
  • Figure 4: LLM-prompt refinement of a complex text prompt.
  • Figure 5: Our results and DreamPhysics results. We present our text-to-3D physics-grounded motion results and the results generated by DreamPhysics using the 3D models provided by our framework.
  • ...and 9 more figures