Table of Contents
Fetching ...

PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image

Han Yan, Mingrui Zhang, Yang Li, Chao Ma, Pan Ji

TL;DR

PhyCAGE is presented, the first approach for physically plausible compositional 3D asset generation from a single image, and a Physical Simulation-Enhanced Score Distillation Sampling (PSE-SDS) technique is introduced to further optimize the positions of the Gaussians.

Abstract

We present PhyCAGE, the first approach for physically plausible compositional 3D asset generation from a single image. Given an input image, we first generate consistent multi-view images for components of the assets. These images are then fitted with 3D Gaussian Splatting representations. To ensure that the Gaussians representing objects are physically compatible with each other, we introduce a Physical Simulation-Enhanced Score Distillation Sampling (PSE-SDS) technique to further optimize the positions of the Gaussians. It is achieved by setting the gradient of the SDS loss as the initial velocity of the physical simulation, allowing the simulator to act as a physics-guided optimizer that progressively corrects the Gaussians' positions to a physically compatible state. Experimental results demonstrate that the proposed method can generate physically plausible compositional 3D assets given a single image.

PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image

TL;DR

PhyCAGE is presented, the first approach for physically plausible compositional 3D asset generation from a single image, and a Physical Simulation-Enhanced Score Distillation Sampling (PSE-SDS) technique is introduced to further optimize the positions of the Gaussians.

Abstract

We present PhyCAGE, the first approach for physically plausible compositional 3D asset generation from a single image. Given an input image, we first generate consistent multi-view images for components of the assets. These images are then fitted with 3D Gaussian Splatting representations. To ensure that the Gaussians representing objects are physically compatible with each other, we introduce a Physical Simulation-Enhanced Score Distillation Sampling (PSE-SDS) technique to further optimize the positions of the Gaussians. It is achieved by setting the gradient of the SDS loss as the initial velocity of the physical simulation, allowing the simulator to act as a physics-guided optimizer that progressively corrects the Gaussians' positions to a physically compatible state. Experimental results demonstrate that the proposed method can generate physically plausible compositional 3D assets given a single image.

Paper Structure

This paper contains 25 sections, 15 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: PhyCAGE can generate compositional 3D assets with interactive objects in a physically plausible manner. The generated 3D Gaussian Splatting shows better visual performance and physical plausibility under Material Point Method (MPM) simulation.
  • Figure 2: The overview of PhyCAGE. Given an input image, we first generate consistent multi-view images for the components of the assets (see Sec. \ref{['sec:stage1']}). Then, we fit multi-view images with 3D Gaussian Splatting representations (see Sec. \ref{['sec:stage2']}). Finally, we introduce a Physical Simulation-Enhanced SDS to further optimize the positions of the Gaussians (see Sec. \ref{['sec:ps_sds']}).
  • Figure 3: Object inpainting with image diffusion models.
  • Figure 4: The overview of our PSE-SDS. The gradients come from the SDS and image loss are divided into two streams during the backpropagation. Specifically, $\nabla_{\theta_{\mu}^{k}}\mathcal{L}$ is utilized as the initial velocity of the physical simulation for updating the positions $\mu$ of Gaussians.
  • Figure 5: Qualitative comparison with previous work. The green box illustrates the decomposed objects, while the blue box highlights the physical relationships, such as whether the components are in penetration (in red circle). Since we use 3DGS representation, we convert Gaussian centers to point clouds for geometry visualization.
  • ...and 4 more figures