Robustifying Long-term Human-Robot Collaboration through a Multimodal and Hierarchical Framework
Peiqi Yu, Abulikemu Abuduweili, Ruixuan Liu, Changliu Liu
TL;DR
This work tackles robust, long-horizon human-robot collaboration by modeling tasks with a hierarchical graph and delivering a multimodal, hierarchical framework that fuses vision and speech signals. It introduces hierarchical pose and plan prediction, online adaptation, and a real-time robot controller to enable proactive, user-specific assistance across extended assembly tasks. Key contributions include a formal mutual-information justification for multimodal fusion, a DTW-based plan alignment mechanism, and extensive real-world validation showing improved task success, reduced disturbances, and higher user satisfaction. The approach holds significant practical impact for flexible manufacturing and assistive robotics in everyday environments by enhancing robustness, efficiency, and user experience in long-term HRC.
Abstract
Long-term Human-Robot Collaboration (HRC) is crucial for enabling flexible manufacturing systems and integrating companion robots into daily human environments over extended periods. This paper identifies several key challenges for such collaborations, such as accurate recognition of human plan, robustness to disturbances, operational efficiency, adaptability to diverse user behaviors, and sustained human satisfaction. To address these challenges, we model the long-term HRC task through a hierarchical task graph and presents a novel multimodal and hierarchical framework to enable robots to better assist humans to advance on the task graph. In particular, the proposed multimodal framework integrates visual observations with speech commands to facilitate intuitive and flexible human-robot interactions. Additionally, our hierarchical designs for both human pose detection and plan prediction allow better understanding of human behaviors and significantly enhance system accuracy, robustness and flexibility. Moreover, an online adaptation mechanism enables real-time adjustment to diverse user behaviors. We deploy the proposed framework to KINOVA GEN3 robot and conduct extensive user studies on real-world long-term HRC assembly scenarios. Experimental results show that our approaches reduce task completion time by 15.9%, achieves an average task success rate of 91.8% and an overall user satisfaction score of 84% in long-term HRC tasks, showcasing its applicability in enhancing real-world long-term HRC.
