A Framework for Adapting Human-Robot Interaction to Diverse User Groups
Theresa Pekarek Rosin, Vanessa Hassouna, Xiaowen Sun, Luca Krohm, Henri-Leon Kordt, Michael Beetz, Stefan Wermter
TL;DR
The paper addresses the challenge of making human–robot interaction robustly usable for diverse user groups by proposing an adaptive, ROS-based HRI framework that supports real-time interruptions and uses a large language model as a dialogue bridge. It integrates age-aware speech recognition, an LLM-driven command translator, and a PyCRAM-based planner with an Interrupt Client to enable minor and major plan changes in a kitchen-scene simulation. Module-level and system-level evaluations show strong binary age recognition, reasonable multi-age granularity, and substantial, though imperfect, success in handling interruptions, with frequent noise-driven failures identified as areas for improvement. The work contributes an open-source framework that combines voice-based usability, adaptive feedback, and robust planning to advance practical, personalized HRI in real-world environments.
Abstract
To facilitate natural and intuitive interactions with diverse user groups in real-world settings, social robots must be capable of addressing the varying requirements and expectations of these groups while adapting their behavior based on user feedback. While previous research often focuses on specific demographics, we present a novel framework for adaptive Human-Robot Interaction (HRI) that tailors interactions to different user groups and enables individual users to modulate interactions through both minor and major interruptions. Our primary contributions include the development of an adaptive, ROS-based HRI framework with an open-source code base. This framework supports natural interactions through advanced speech recognition and voice activity detection, and leverages a large language model (LLM) as a dialogue bridge. We validate the efficiency of our framework through module tests and system trials, demonstrating its high accuracy in age recognition and its robustness to repeated user inputs and plan changes.
