Table of Contents
Fetching ...

Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting

Yunze Liu, Changxi Chen, Li Yi

TL;DR

This work defines online full-body motion reaction synthesis to enable real-time humanoid responses that include hand actions and object interactions. It introduces social affordance canonicalization and forecasting, supported by two new datasets (HHI and CoChair) and a unified framework based on carrier-centric representations and a 4D Transformer. The method demonstrates superior performance over baselines on multiple benchmarks and provides ablations showing the importance of local-frame canonicalization and future-motion forecasting. The approach offers practical impact for VR/AR, humanoid robots, and collaborative tasks by delivering prompt, natural, and detailed social reactions.

Abstract

We focus on the human-humanoid interaction task optionally with an object. We propose a new task named online full-body motion reaction synthesis, which generates humanoid reactions based on the human actor's motions. The previous work only focuses on human interaction without objects and generates body reactions without hand. Besides, they also do not consider the task as an online setting, which means the inability to observe information beyond the current moment in practical situations. To support this task, we construct two datasets named HHI and CoChair and propose a unified method. Specifically, we propose to construct a social affordance representation. We first select a social affordance carrier and use SE(3)-Equivariant Neural Networks to learn the local frame for the carrier, then we canonicalize the social affordance. Besides, we propose a social affordance forecasting scheme to enable the reactor to predict based on the imagined future. Experiments demonstrate that our approach can effectively generate high-quality reactions on HHI and CoChair. Furthermore, we also validate our method on existing human interaction datasets Interhuman and Chi3D.

Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting

TL;DR

This work defines online full-body motion reaction synthesis to enable real-time humanoid responses that include hand actions and object interactions. It introduces social affordance canonicalization and forecasting, supported by two new datasets (HHI and CoChair) and a unified framework based on carrier-centric representations and a 4D Transformer. The method demonstrates superior performance over baselines on multiple benchmarks and provides ablations showing the importance of local-frame canonicalization and future-motion forecasting. The approach offers practical impact for VR/AR, humanoid robots, and collaborative tasks by delivering prompt, natural, and detailed social reactions.

Abstract

We focus on the human-humanoid interaction task optionally with an object. We propose a new task named online full-body motion reaction synthesis, which generates humanoid reactions based on the human actor's motions. The previous work only focuses on human interaction without objects and generates body reactions without hand. Besides, they also do not consider the task as an online setting, which means the inability to observe information beyond the current moment in practical situations. To support this task, we construct two datasets named HHI and CoChair and propose a unified method. Specifically, we propose to construct a social affordance representation. We first select a social affordance carrier and use SE(3)-Equivariant Neural Networks to learn the local frame for the carrier, then we canonicalize the social affordance. Besides, we propose a social affordance forecasting scheme to enable the reactor to predict based on the imagined future. Experiments demonstrate that our approach can effectively generate high-quality reactions on HHI and CoChair. Furthermore, we also validate our method on existing human interaction datasets Interhuman and Chi3D.
Paper Structure (18 sections, 10 equations, 10 figures, 5 tables)

This paper contains 18 sections, 10 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: We propose a new task named online full-body motion reaction synthesis optionally with an object. Left: we construct two datasets HHI and CoChair to support the task. Right: we propose Social Affordance Canonicalization and Forecasting technique to generate realistic reactions and establish benchmarks.
  • Figure 2: We construct two datasets to support the research on reaction synthesis. HHI(left) is the first large-scale whole-body motion reaction dataset with clear action feedback. CoChair(right) is the first large-scale dataset for multi-human and object interaction
  • Figure 3: Social Affordance Canonicalization. Given a sequence, we first select a social affordance carrier and build the carrier-centric representation. Then we can compute the social affordance representation. We propose to learn the local frame for carrier and canonicalize social affordance to simplify the distribution. Then a motion encoder and decoder are used to generate reactions.
  • Figure 4: Social Affordance Forecasting. At the training stage, the humanoid reactor can access all motions of the actor. At the prediction stage in the real world, the humanoid reactor can only observe the past motions of the human actor. The forecasting module can anticipate the motions that the human will take.
  • Figure 5: Visualization results on CoChair. Our method can provide a more reasonable grasp and better collaboration with the human actor.
  • ...and 5 more figures