Toward Seamless Physical Human-Humanoid Interaction: Insights from Control, Intent, and Modeling with a Vision for What Comes Next
Gustavo A. Cardona, Shubham S. Kumbhar, Panagiotis Artemiadis
TL;DR
<3-5 sentence high-level summary> This paper surveys Physical Human-Humanoid Interaction (pHHI) through three core pillars—humanoid modeling and control, human intent estimation, and computational human models—and argues that seamless interaction requires a unified framework that tightly integrates sensing, prediction, and control. It analyzes classical, optimization-based, and learning-based approaches across these pillars, highlighting safety, stability, and adaptability as central requirements. The authors propose a modular yet integrated architecture with defined interfaces (safety-certified foundation, dynamic human feasibility, and intent-aware planning) and outline future directions including proactive interaction, personalization, and privacy-by-design. By mapping the current landscape and proposing concrete pathways, the work lays a roadmap for deployable, robust, human-centered humanoid collaboration in real-world settings.
Abstract
Physical Human-Humanoid Interaction (pHHI) is a rapidly advancing field with significant implications for deploying robots in unstructured, human-centric environments. In this review, we examine the current state of the art in pHHI through three core pillars: (i) humanoid modeling and control, (ii) human intent estimation, and (iii) computational human models. For each pillar, we survey representative approaches, identify open challenges, and analyze current limitations that hinder robust, scalable, and adaptive interaction. These include the need for whole-body control strategies capable of handling uncertain human dynamics, real-time intent inference under limited sensing, and modeling techniques that account for variability in human physical states. Although significant progress has been made within each domain, integration across pillars remains limited. We propose pathways for unifying methods across these areas to enable cohesive interaction frameworks. This structure enables us not only to map the current landscape but also to propose concrete directions for future research that aim to bridge these domains. Additionally, we introduce a unified taxonomy of interaction types based on modality, distinguishing between direct interactions (e.g., physical contact) and indirect interactions (e.g., object-mediated), and on the level of robot engagement, ranging from assistance to cooperation and collaboration. For each category in this taxonomy, we provide the three core pillars that highlight opportunities for cross-pillar unification. Our goal is to suggest avenues to advance robust, safe, and intuitive physical interaction, providing a roadmap for future research that will allow humanoid systems to effectively understand, anticipate, and collaborate with human partners in diverse real-world settings.
