Table of Contents
Fetching ...

Phys4DGen: Physics-Compliant 4D Generation with Multi-Material Composition Perception

Jiajing Lin, Zhenzhong Wang, Dejun Xu, Shu Jiang, YunPeng Gong, Min Jiang

TL;DR

Phys4DGen tackles the lack of physical realism and multi-material handling in 4D generation by integrating a multi-material perception pipeline with physics-based simulation. It introduces 3D Gaussians generation, 3D Material Grouping, Physical Internal Structure Discovery, and MLLMs-guided material identification (via GPT-4o and CLIP fusion) to create a material-continuum representation for MPM-based dynamics. The approach yields physically plausible, high-fidelity 4D content from a single image or a 3D input and outperforms state-of-the-art methods in spatiotemporal consistency and realism, while enabling fine-grained material control. This framework enables practical, user-friendly physics-aware 4D content generation with potential applications in animation, gaming, and AR/VR, and opens avenues for extending to multi-object scenes.

Abstract

4D content generation aims to create dynamically evolving 3D content that responds to specific input objects such as images or 3D representations. Current approaches typically incorporate physical priors to animate 3D representations, but these methods suffer from significant limitations: they not only require users lacking physics expertise to manually specify material properties but also struggle to effectively handle the generation of multi-material composite objects. To address these challenges, we propose Phys4DGen, a novel 4D generation framework that integrates multi-material composition perception with physical simulation. The framework achieves automated, physically plausible 4D generation through three innovative modules: first, the 3D Material Grouping module partitions heterogeneous material regions on 3D representations' surfaces via semantic segmentation; second, the Internal Physical Structure Discovery module constructs the mechanical structure of object interiors; finally, we distill physical prior knowledge from multimodal large language models to enable rapid and automatic material properties identification for both objects' surfaces and interiors. Experiments on both synthetic and real-world datasets demonstrate that Phys4DGen can generate high-fidelity 4D content with physical realism in open-world scenarios, significantly outperforming state-of-the-art methods.

Phys4DGen: Physics-Compliant 4D Generation with Multi-Material Composition Perception

TL;DR

Phys4DGen tackles the lack of physical realism and multi-material handling in 4D generation by integrating a multi-material perception pipeline with physics-based simulation. It introduces 3D Gaussians generation, 3D Material Grouping, Physical Internal Structure Discovery, and MLLMs-guided material identification (via GPT-4o and CLIP fusion) to create a material-continuum representation for MPM-based dynamics. The approach yields physically plausible, high-fidelity 4D content from a single image or a 3D input and outperforms state-of-the-art methods in spatiotemporal consistency and realism, while enabling fine-grained material control. This framework enables practical, user-friendly physics-aware 4D content generation with potential applications in animation, gaming, and AR/VR, and opens avenues for extending to multi-object scenes.

Abstract

4D content generation aims to create dynamically evolving 3D content that responds to specific input objects such as images or 3D representations. Current approaches typically incorporate physical priors to animate 3D representations, but these methods suffer from significant limitations: they not only require users lacking physics expertise to manually specify material properties but also struggle to effectively handle the generation of multi-material composite objects. To address these challenges, we propose Phys4DGen, a novel 4D generation framework that integrates multi-material composition perception with physical simulation. The framework achieves automated, physically plausible 4D generation through three innovative modules: first, the 3D Material Grouping module partitions heterogeneous material regions on 3D representations' surfaces via semantic segmentation; second, the Internal Physical Structure Discovery module constructs the mechanical structure of object interiors; finally, we distill physical prior knowledge from multimodal large language models to enable rapid and automatic material properties identification for both objects' surfaces and interiors. Experiments on both synthetic and real-world datasets demonstrate that Phys4DGen can generate high-fidelity 4D content with physical realism in open-world scenarios, significantly outperforming state-of-the-art methods.

Paper Structure

This paper contains 39 sections, 12 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: The red arrows indicate the direction of external forces. We use the space-time slice (right column), where the vertical axis represents time and the horizontal axis shows a spatial slice of the object (marked by red lines), to reveal motion intensity and frequency. As shown in the top, diffusion models embed unrealistic motion priors that may mislead the estimation process—e.g., Physics3D consistently overestimates the softness of the ficus, deviating from physical plausibility. Additionally, the accuracy of such approaches heavily depends on the setting of initial material properties (e.g., Init Young's modulus $10^4$ vs. $10^6$). In contrast, our method achieves more accurate material properties estimation within 14.88 seconds, enabling reliable simulation.
  • Figure 2: Framework of Phys4DGen. (a) 3D Gaussians Generation: Given an input image, a static 3D Gaussians is generated under the guidance of the diffusion model. (b) Material Grouping and Internal Discover: 3D Material Grouping is applied to partition the 3D Gaussians into distinct material groups. Concurrently, Internal Physical Structure Discovery is used to fill internal particles and determine their corresponding material groups. (c) MLLMs-Guided Material Identification: Surface and internal material properties are visually inferred by MLLMs. These inferred results are then integrated into the 3D representation $\mathbb{P}$ through the CLIP Fusion module, forming a material continuum representation $\tilde{\mathbb{P}}$. (d) 4D Dynamics Generation: Given external forces, MPM simulation is performed to animate the material continuum, thereby generating 4D content.
  • Figure 3: Visual results of Phys4DGen. Phys4DGen is capable of perceiving the multi-material composition of 3D objects and generating physically realistic 4D content under given external forces (red arrows).
  • Figure 4: Qualitative comparison in image-to-4D generation. To compare the spatiotemporal consistency, the rendering view changes with each time step. The red box highlights regions exhibiting physically implausible behavior for further observation. The dynamics generated by our method are more consistent with physical laws compared to the baseline method.
  • Figure 5: Qualitative comparison for 3D-to-4D generation. We compare our results with real videos and baselines using space-time slices, These slices reveal the motion's intensity and frequency. Our results more closely match the ground truth.
  • ...and 9 more figures