Table of Contents
Fetching ...

A Framework for Adapting Human-Robot Interaction to Diverse User Groups

Theresa Pekarek Rosin, Vanessa Hassouna, Xiaowen Sun, Luca Krohm, Henri-Leon Kordt, Michael Beetz, Stefan Wermter

TL;DR

The paper addresses the challenge of making human–robot interaction robustly usable for diverse user groups by proposing an adaptive, ROS-based HRI framework that supports real-time interruptions and uses a large language model as a dialogue bridge. It integrates age-aware speech recognition, an LLM-driven command translator, and a PyCRAM-based planner with an Interrupt Client to enable minor and major plan changes in a kitchen-scene simulation. Module-level and system-level evaluations show strong binary age recognition, reasonable multi-age granularity, and substantial, though imperfect, success in handling interruptions, with frequent noise-driven failures identified as areas for improvement. The work contributes an open-source framework that combines voice-based usability, adaptive feedback, and robust planning to advance practical, personalized HRI in real-world environments.

Abstract

To facilitate natural and intuitive interactions with diverse user groups in real-world settings, social robots must be capable of addressing the varying requirements and expectations of these groups while adapting their behavior based on user feedback. While previous research often focuses on specific demographics, we present a novel framework for adaptive Human-Robot Interaction (HRI) that tailors interactions to different user groups and enables individual users to modulate interactions through both minor and major interruptions. Our primary contributions include the development of an adaptive, ROS-based HRI framework with an open-source code base. This framework supports natural interactions through advanced speech recognition and voice activity detection, and leverages a large language model (LLM) as a dialogue bridge. We validate the efficiency of our framework through module tests and system trials, demonstrating its high accuracy in age recognition and its robustness to repeated user inputs and plan changes.

A Framework for Adapting Human-Robot Interaction to Diverse User Groups

TL;DR

The paper addresses the challenge of making human–robot interaction robustly usable for diverse user groups by proposing an adaptive, ROS-based HRI framework that supports real-time interruptions and uses a large language model as a dialogue bridge. It integrates age-aware speech recognition, an LLM-driven command translator, and a PyCRAM-based planner with an Interrupt Client to enable minor and major plan changes in a kitchen-scene simulation. Module-level and system-level evaluations show strong binary age recognition, reasonable multi-age granularity, and substantial, though imperfect, success in handling interruptions, with frequent noise-driven failures identified as areas for improvement. The work contributes an open-source framework that combines voice-based usability, adaptive feedback, and robust planning to advance practical, personalized HRI in real-world environments.

Abstract

To facilitate natural and intuitive interactions with diverse user groups in real-world settings, social robots must be capable of addressing the varying requirements and expectations of these groups while adapting their behavior based on user feedback. While previous research often focuses on specific demographics, we present a novel framework for adaptive Human-Robot Interaction (HRI) that tailors interactions to different user groups and enables individual users to modulate interactions through both minor and major interruptions. Our primary contributions include the development of an adaptive, ROS-based HRI framework with an open-source code base. This framework supports natural interactions through advanced speech recognition and voice activity detection, and leverages a large language model (LLM) as a dialogue bridge. We validate the efficiency of our framework through module tests and system trials, demonstrating its high accuracy in age recognition and its robustness to repeated user inputs and plan changes.

Paper Structure

This paper contains 15 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The simulation environment with the kitchen scenario. Left: The robot moves around and interacts with the environment to search for the object. Right: It then places the object in front of the user on either the table or the counter.
  • Figure 2: Our architecture and the ROS communication processes. The user interacts with the system using natural language and receives vocal feedback. The user's speech is processed by an age and speech recognition model which transcribes the speech and detects the age group. This information is sent to the dialogue bridge, where commands and parameters are extracted and forwarded to the robotic agent, which executes the actions. The user can interrupt the robot at any time.
  • Figure 3: The concept of the Dialogue Bridge. The LLM connects the user and the robot by processing the user's utterances (U), turning them into a command (C) with extracted target properties (P) for the robot, as well as monitoring the internal state (S) of the robot and generating an appropriate response (R) to the user.
  • Figure 4: The confusion matrix for the Age Recognition model. The matrix shows that the model predicts either the correct age group or one of the two adjacent groups.