Table of Contents
Fetching ...

TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size

Stefan Lionar, Gim Hee Lee

TL;DR

This work presents TeamHOI, a framework that enables a single decentralized policy to handle cooperative HOIs across any number of cooperating agents, and introduces a masked Adversarial Motion Prior (AMP) strategy that uses single-human reference motions while masking object-interacting body parts during training.

Abstract

Physics-based humanoid control has achieved remarkable progress in enabling realistic and high-performing single-agent behaviors, yet extending these capabilities to cooperative human-object interaction (HOI) remains challenging. We present TeamHOI, a framework that enables a single decentralized policy to handle cooperative HOIs across any number of cooperating agents. Each agent operates using local observations while attending to other teammates through a Transformer-based policy network with teammate tokens, allowing scalable coordination across variable team sizes. To enforce motion realism while addressing the scarcity of cooperative HOI data, we further introduce a masked Adversarial Motion Prior (AMP) strategy that uses single-human reference motions while masking object-interacting body parts during training. The masked regions are then guided through task rewards to produce diverse and physically plausible cooperative behaviors. We evaluate TeamHOI on a challenging cooperative carrying task involving two to eight humanoid agents and varied object geometries. Finally, to promote stable carrying, we design a team-size- and shape-agnostic formation reward. TeamHOI achieves high success rates and demonstrates coherent cooperation across diverse configurations with a single policy.

TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size

TL;DR

This work presents TeamHOI, a framework that enables a single decentralized policy to handle cooperative HOIs across any number of cooperating agents, and introduces a masked Adversarial Motion Prior (AMP) strategy that uses single-human reference motions while masking object-interacting body parts during training.

Abstract

Physics-based humanoid control has achieved remarkable progress in enabling realistic and high-performing single-agent behaviors, yet extending these capabilities to cooperative human-object interaction (HOI) remains challenging. We present TeamHOI, a framework that enables a single decentralized policy to handle cooperative HOIs across any number of cooperating agents. Each agent operates using local observations while attending to other teammates through a Transformer-based policy network with teammate tokens, allowing scalable coordination across variable team sizes. To enforce motion realism while addressing the scarcity of cooperative HOI data, we further introduce a masked Adversarial Motion Prior (AMP) strategy that uses single-human reference motions while masking object-interacting body parts during training. The masked regions are then guided through task rewards to produce diverse and physically plausible cooperative behaviors. We evaluate TeamHOI on a challenging cooperative carrying task involving two to eight humanoid agents and varied object geometries. Finally, to promote stable carrying, we design a team-size- and shape-agnostic formation reward. TeamHOI achieves high success rates and demonstrates coherent cooperation across diverse configurations with a single policy.
Paper Structure (31 sections, 30 equations, 10 figures, 4 tables)

This paper contains 31 sections, 30 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: We present TeamHOI, a framework for learning a unified decentralized policy for cooperative human-object interactions (HOI) across varying team sizes and object configurations. Our framework enables effective cooperation where each humanoid acts independently from local observations while coordinating with others through a single shared policy. Video demonstrations are provided on our https://splionar.github.io/TeamHOI.
  • Figure 2: Overview of TeamHOI framework. A transformer-based policy network enables coordination between the observing agent (green humanoid) and its teammates (grey humanoids) through alternating self- and cross-attention layers. By training across diverse team-size environments, the framework learns a unified policy that works across different team configurations. To maintain motion realism and enhance skill diversity, a masked AMP strategy blends full-body and masked discriminators based on object interaction.
  • Figure 3: Illustration of our principal-axes coverage reward.
  • Figure 4: Qualitative comparison across 4-agent (top) and 8-agent (bottom) configurations. Our method produces synchronized and stable teamwork across both cases, whereas the CooHOI* baselines exhibit limited or ineffective cooperation. Red line indicates the table’s movement trajectory, and the black dot marks its final position at the end of each episode.
  • Figure 5: Ablation on the masked AMP strategy. Comparison between models trained with and without masked AMP, showing improved task rewards and successful hand-object interactions when masking is applied.
  • ...and 5 more figures