Table of Contents
Fetching ...

FORCE: Physics-aware Human-object Interaction

Xiaohan Zhang, Bharat Lal Bhatnagar, Sebastian Starke, Ilya Petrov, Vladimir Guzov, Helisa Dhamo, Eduardo Pérez-Pellitero, Gerard Pons-Moll

TL;DR

FORCE addresses the challenge of synthesizing realistic human–object interactions by explicitly modeling physical attributes—object resistance and human exerted force—through an intuitive physics encoding. The method combines MNet, a physics-aware motion predictor, with CNet, a resistance-conditioned hand-contact predictor, to auto-regress diverse, physically plausible motions and contact patterns. A new dataset of 450 sequences across 8 objects and multiple resistance levels provides a benchmark for evaluating HOI synthesis under varying physical properties. Empirical results show FORCE achieves state-of-the-art diversity and realism, including high online success and realistic contact behavior, while generalizing to unseen shapes and locations thanks to object shape augmentation.

Abstract

Interactions between human and objects are influenced not only by the object's pose and shape, but also by physical attributes such as object mass and surface friction. They introduce important motion nuances that are essential for diversity and realism. Despite advancements in recent human-object interaction methods, this aspect has been overlooked. Generating nuanced human motion presents two challenges. First, it is non-trivial to learn from multi-modal human and object information derived from both the physical and non-physical attributes. Second, there exists no dataset capturing nuanced human interactions with objects of varying physical properties, hampering model development. This work addresses the gap by introducing the FORCE model, an approach for synthesizing diverse, nuanced human-object interactions by modeling physical attributes. Our key insight is that human motion is dictated by the interrelation between the force exerted by the human and the perceived resistance. Guided by a novel intuitive physics encoding, the model captures the interplay between human force and resistance. Experiments also demonstrate incorporating human force facilitates learning multi-class motion. Accompanying our model, we contribute a dataset, which features diverse, different-styled motion through interactions with varying resistances.

FORCE: Physics-aware Human-object Interaction

TL;DR

FORCE addresses the challenge of synthesizing realistic human–object interactions by explicitly modeling physical attributes—object resistance and human exerted force—through an intuitive physics encoding. The method combines MNet, a physics-aware motion predictor, with CNet, a resistance-conditioned hand-contact predictor, to auto-regress diverse, physically plausible motions and contact patterns. A new dataset of 450 sequences across 8 objects and multiple resistance levels provides a benchmark for evaluating HOI synthesis under varying physical properties. Empirical results show FORCE achieves state-of-the-art diversity and realism, including high online success and realistic contact behavior, while generalizing to unseen shapes and locations thanks to object shape augmentation.

Abstract

Interactions between human and objects are influenced not only by the object's pose and shape, but also by physical attributes such as object mass and surface friction. They introduce important motion nuances that are essential for diversity and realism. Despite advancements in recent human-object interaction methods, this aspect has been overlooked. Generating nuanced human motion presents two challenges. First, it is non-trivial to learn from multi-modal human and object information derived from both the physical and non-physical attributes. Second, there exists no dataset capturing nuanced human interactions with objects of varying physical properties, hampering model development. This work addresses the gap by introducing the FORCE model, an approach for synthesizing diverse, nuanced human-object interactions by modeling physical attributes. Our key insight is that human motion is dictated by the interrelation between the force exerted by the human and the perceived resistance. Guided by a novel intuitive physics encoding, the model captures the interplay between human force and resistance. Experiments also demonstrate incorporating human force facilitates learning multi-class motion. Accompanying our model, we contribute a dataset, which features diverse, different-styled motion through interactions with varying resistances.
Paper Structure (14 sections, 2 equations, 9 figures, 5 tables)

This paper contains 14 sections, 2 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 2: Given the input human and object poses and the object geometry, our method which consists of MNet and CNet, synthesizes auto-repressively a diverse spectrum of nuanced interactions.
  • Figure 3: The FORCE dataset accurately captures diverse, nuanced interaction motion under varying levels of resistance ($N$ for Newtons). The last column shows results from the object shape augmentation (see Section \ref{['sec:dataset']}), where nuanced motion details are preserved.
  • Figure 4: Comparison with baselines. Our intuitive physics encoding$\mathcal{F}$ disambiguates different multi-class motions and enables the synthesis of the desired interaction (carrying in a, pulling in b). c) and d) show the coupled encoding of human force and resistance allows the synthesis of visually plausible interactions. In c), when the resistance is greater than the human force, FORCE synthesizes a more plausible "infeasible" interaction. In d), with low resistance, the human pose maintains an upright position.
  • Figure 5: Nuanced interactions with the same object shape, but different resistance. a) The human struggles with the interaction as the resistance increases from 50 N to 200 N. At 500 N, the human fails to interact with the object. b) Plot depicts the horizontal shift of the center of mass of the human, estimated following tripathi2023ipmancom. Results averaged over 1000 frames of the synthesized interaction.
  • Figure 6: Without Physics Encoding (left) vs. FORCE (right). With physics encoding, synthesizes visually plausible motion, when the resistance is greater than the applied human force.
  • ...and 4 more figures