Table of Contents
Fetching ...

PUGS: Zero-shot Physical Understanding with Gaussian Splatting

Yinghao Shuai, Ran Yu, Yuantao Chen, Zijian Jiang, Xiaowei Song, Nan Wang, Jv Zheng, Jianzhu Ma, Meng Yang, Zhicheng Wang, Wenbo Ding, Hao Zhao

TL;DR

PUGS tackles the challenge of inferring physical properties such as mass and hardness from RGB images by coupling 3D Gaussian Splatting with geometry-aware regularization and region-aware features, followed by zero-shot property reasoning via Vision-Language Models and Gaussian volume integration. The method delivers state-of-the-art ABO-500 mass predictions and improved material segmentation, and demonstrates practical gains in grasping tasks by producing more accurate mechanical properties (e.g., Young's modulus) than prior NeRF-based approaches. Key innovations include GARL, region-aware feature propagation, and CLIP-guided, region-consistent property transfer to Gaussian primitives. The work advances robotic perception by enabling efficient, zero-shot physical understanding directly from images, with potential applicability to scene-level reasoning in the future.

Abstract

Current robotic systems can understand the categories and poses of objects well. But understanding physical properties like mass, friction, and hardness, in the wild, remains challenging. We propose a new method that reconstructs 3D objects using the Gaussian splatting representation and predicts various physical properties in a zero-shot manner. We propose two techniques during the reconstruction phase: a geometry-aware regularization loss function to improve the shape quality and a region-aware feature contrastive loss function to promote region affinity. Two other new techniques are designed during inference: a feature-based property propagation module and a volume integration module tailored for the Gaussian representation. Our framework is named as zero-shot physical understanding with Gaussian splatting, or PUGS. PUGS achieves new state-of-the-art results on the standard benchmark of ABO-500 mass prediction. We provide extensive quantitative ablations and qualitative visualization to demonstrate the mechanism of our designs. We show the proposed methodology can help address challenging real-world grasping tasks. Our codes, data, and models are available at https://github.com/EverNorif/PUGS

PUGS: Zero-shot Physical Understanding with Gaussian Splatting

TL;DR

PUGS tackles the challenge of inferring physical properties such as mass and hardness from RGB images by coupling 3D Gaussian Splatting with geometry-aware regularization and region-aware features, followed by zero-shot property reasoning via Vision-Language Models and Gaussian volume integration. The method delivers state-of-the-art ABO-500 mass predictions and improved material segmentation, and demonstrates practical gains in grasping tasks by producing more accurate mechanical properties (e.g., Young's modulus) than prior NeRF-based approaches. Key innovations include GARL, region-aware feature propagation, and CLIP-guided, region-consistent property transfer to Gaussian primitives. The work advances robotic perception by enabling efficient, zero-shot physical understanding directly from images, with potential applicability to scene-level reasoning in the future.

Abstract

Current robotic systems can understand the categories and poses of objects well. But understanding physical properties like mass, friction, and hardness, in the wild, remains challenging. We propose a new method that reconstructs 3D objects using the Gaussian splatting representation and predicts various physical properties in a zero-shot manner. We propose two techniques during the reconstruction phase: a geometry-aware regularization loss function to improve the shape quality and a region-aware feature contrastive loss function to promote region affinity. Two other new techniques are designed during inference: a feature-based property propagation module and a volume integration module tailored for the Gaussian representation. Our framework is named as zero-shot physical understanding with Gaussian splatting, or PUGS. PUGS achieves new state-of-the-art results on the standard benchmark of ABO-500 mass prediction. We provide extensive quantitative ablations and qualitative visualization to demonstrate the mechanism of our designs. We show the proposed methodology can help address challenging real-world grasping tasks. Our codes, data, and models are available at https://github.com/EverNorif/PUGS

Paper Structure

This paper contains 15 sections, 12 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison between NeRF2Physics zhai2024physical and our proposed PUGS. The input is a set of RGB images, and the output is a reconstructed target with physical property understanding. PUGS switches from NeRF to Gaussian Splatting and correctly predicts both the material (cotton) and the Young's modulus, while NeRF2Physics fails. With the correct Young's modulus, the robotic gripper can adjust its opening size and successfully grasp the object.
  • Figure 2: Overview of PUGS. We take the multi-view images of the object as input, reconstruct the Gaussians representation with regional features through Shape Aware 3DGS Reconstruction, predict the candidate properties in a zero-shot manner through VLM Based Physical Property Prediction, and finally obtains the material segmentation result and property prediction result through Feature Based Property Propagation. With proposed Gaussian Volume Integration, we can calculate the object-level property, like mass.
  • Figure 3: Comparison of reconstruction results before and after applying geometry-aware regularization loss (GARL). Result (a) without the GARL exhibit some floaters and blurred areas; result (b) with the GARL can achieve the results with more accurate geometry.
  • Figure 4: Explanation of different modules in PUGS. During the reconstruction process, we compute the (a) geometry-aware regularization loss using normals obtained through two different methods. With (b) region-aware feature contrastive loss, we pull the features corresponding to Gaussians belonging to the same mask, while pushing apart the features corresponding to different masks. During (c) feature based property propagation, we use the similarity of region-aware feature to propagate physical properties.
  • Figure 5: Region Aware Feature Visualization and Material Segmentation Result of some object. The visualization results in (b) demonstrate that our region-aware feature can effectively identify different regions of the object. (c) and (d) represent the material prediction results of PUGS and NeRF2Physics, respectively. The results from NeRF2Physics are more fragmented, whereas our PUGS achieves more coherent and accurate material segmentation.