Table of Contents
Fetching ...

Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting

Weiquan Wang, Jun Xiao, Yueting Zhuang, Long Chen

TL;DR

The paper tackles the challenge of rendering human-object interactions from sparse views. It introduces HOGS, which combines 3D Gaussian Splatting with a physics-aware optimization framework to enforce plausible HOI interactions. Key contributions include deformation-based joint human-object modeling, composed Gaussian rendering with adaptive refinement, sparse-view pose refinement, and a contact-prediction-guided physics loss (attraction and repulsion) augmented by a precomputed object SDF. Experiments on the HOI dataset HODome and the MANUS-Grasps extension demonstrate state-of-the-art rendering quality and real-time performance, highlighting applicability to articulated hand-object interactions.

Abstract

Rendering realistic human-object interactions (HOIs) from sparse-view inputs is challenging due to occlusions and incomplete observations, yet crucial for various real-world applications. Existing methods always struggle with either low rendering qualities (\eg, visual fidelity and physically plausible HOIs) or high computational costs. To address these limitations, we propose HOGS (Human-Object Rendering via 3D Gaussian Splatting), a novel framework for efficient and physically plausible HOI rendering from sparse views. Specifically, HOGS combines 3D Gaussian Splatting with a physics-aware optimization process. It incorporates a Human Pose Refinement module for accurate pose estimation and a Sparse-View Human-Object Contact Prediction module for efficient contact region identification. This combination enables coherent joint rendering of human and object Gaussians while enforcing physically plausible interactions. Extensive experiments on the HODome dataset demonstrate that HOGS achieves superior rendering quality, efficiency, and physical plausibility compared to existing methods. We further show its extensibility to hand-object grasp rendering tasks, presenting its broader applicability to articulated object interactions.

Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting

TL;DR

The paper tackles the challenge of rendering human-object interactions from sparse views. It introduces HOGS, which combines 3D Gaussian Splatting with a physics-aware optimization framework to enforce plausible HOI interactions. Key contributions include deformation-based joint human-object modeling, composed Gaussian rendering with adaptive refinement, sparse-view pose refinement, and a contact-prediction-guided physics loss (attraction and repulsion) augmented by a precomputed object SDF. Experiments on the HOI dataset HODome and the MANUS-Grasps extension demonstrate state-of-the-art rendering quality and real-time performance, highlighting applicability to articulated hand-object interactions.

Abstract

Rendering realistic human-object interactions (HOIs) from sparse-view inputs is challenging due to occlusions and incomplete observations, yet crucial for various real-world applications. Existing methods always struggle with either low rendering qualities (\eg, visual fidelity and physically plausible HOIs) or high computational costs. To address these limitations, we propose HOGS (Human-Object Rendering via 3D Gaussian Splatting), a novel framework for efficient and physically plausible HOI rendering from sparse views. Specifically, HOGS combines 3D Gaussian Splatting with a physics-aware optimization process. It incorporates a Human Pose Refinement module for accurate pose estimation and a Sparse-View Human-Object Contact Prediction module for efficient contact region identification. This combination enables coherent joint rendering of human and object Gaussians while enforcing physically plausible interactions. Extensive experiments on the HODome dataset demonstrate that HOGS achieves superior rendering quality, efficiency, and physical plausibility compared to existing methods. We further show its extensibility to hand-object grasp rendering tasks, presenting its broader applicability to articulated object interactions.

Paper Structure

This paper contains 23 sections, 12 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Comparison of state-of-the-art sparse-view HOI rendering methods. Mesh-based methods have limitations in rendering efficiency and quality; NeRF-based methods show weaknesses in rendering efficiency and physically plausible HOI; and existing 3DGS methods lack effective HOI handling despite decent single-human rendering performance. Our proposed HOGS significantly improves high-quality, efficiency, and physically plausible HOI rendering simultaneously.
  • Figure 2: HOGS pipeline. Given some sparse views of a dynamic HOI scene, HOGS first deforms human and object representations using a Human-Object Deformation process, which includes LBS for humans and rigid transformations for objects, along with a Human Pose Refinement module to enhance target pose accuracy. Deformed human and object Gaussians are then composed into a unified 3D space to form the Composed Gaussian Splatting. Finally, this composed result is optimized with a Physics-Aware Rendering Optimization process, which incorporates a Human-Object Contact Prediction module and a physical loss to enforce physically plausible interactions.
  • Figure 3: Illustration of sparse-view human pose refinement module. Sparse-view RGB images are processed by an HMR regressor to obtain initial pose and shape estimations. These estimations are then refined through a sparse-view optimization with a dynamic view weighting approach.
  • Figure 4: The workflow of sparse-view contact prediction method. Given sparse-view images, a single-view contact prediction encoder outputs per-view features, which are then processed by cross-view attention and feature fusion to produce a sparse-view fused feature. Finally, a classifier predicts the contact regions corresponding to the SMPL-H vertices.
  • Figure 5: Qualitative evaluation of novel view synthesis for HOI rendering on the HODome dataset. HOGS demonstrates superior visual fidelity and more accurate representation of human-object interactions compared to existing methods.
  • ...and 2 more figures