Table of Contents
Fetching ...

Multimodal Interaction and Intention Communication for Industrial Robots

Tim Schreiter, Andrey Rudenko, Jens V. Rüppel, Martin Magnusson, Achim J. Lilienthal

TL;DR

Industrial robots must operate safely and intuitively in human-centric environments. The paper develops an expressive, multimodal HRI framework anchored by the Anthropomorphic Robotic Mock Driver (ARMoD) as a communication proxy for non-humanoid hosts, and augments it with Large Language Models (LLMs) to interpret context and generate responses. Through controlled lab studies employing gaze tracking and motion capture, the work demonstrates that ARMoD with multimodal cues can accelerate task localization and concentrate user attention, though LLM-enhanced responses do not always outperform scripted interactions. The findings propose a transferable, quantifiable evaluation pipeline for HRI design in industry, with potential applicability to other domains beyond manufacturing.

Abstract

Successful adoption of industrial robots will strongly depend on their ability to safely and efficiently operate in human environments, engage in natural communication, understand their users, and express intentions intuitively while avoiding unnecessary distractions. To achieve this advanced level of Human-Robot Interaction (HRI), robots need to acquire and incorporate knowledge of their users' tasks and environment and adopt multimodal communication approaches with expressive cues that combine speech, movement, gazes, and other modalities. This paper presents several methods to design, enhance, and evaluate expressive HRI systems for non-humanoid industrial robots. We present the concept of a small anthropomorphic robot communicating as a proxy for its non-humanoid host, such as a forklift. We developed a multimodal and LLM-enhanced communication framework for this robot and evaluated it in several lab experiments, using gaze tracking and motion capture to quantify how users perceive the robot and measure the task progress.

Multimodal Interaction and Intention Communication for Industrial Robots

TL;DR

Industrial robots must operate safely and intuitively in human-centric environments. The paper develops an expressive, multimodal HRI framework anchored by the Anthropomorphic Robotic Mock Driver (ARMoD) as a communication proxy for non-humanoid hosts, and augments it with Large Language Models (LLMs) to interpret context and generate responses. Through controlled lab studies employing gaze tracking and motion capture, the work demonstrates that ARMoD with multimodal cues can accelerate task localization and concentrate user attention, though LLM-enhanced responses do not always outperform scripted interactions. The findings propose a transferable, quantifiable evaluation pipeline for HRI design in industry, with potential applicability to other domains beyond manufacturing.

Abstract

Successful adoption of industrial robots will strongly depend on their ability to safely and efficiently operate in human environments, engage in natural communication, understand their users, and express intentions intuitively while avoiding unnecessary distractions. To achieve this advanced level of Human-Robot Interaction (HRI), robots need to acquire and incorporate knowledge of their users' tasks and environment and adopt multimodal communication approaches with expressive cues that combine speech, movement, gazes, and other modalities. This paper presents several methods to design, enhance, and evaluate expressive HRI systems for non-humanoid industrial robots. We present the concept of a small anthropomorphic robot communicating as a proxy for its non-humanoid host, such as a forklift. We developed a multimodal and LLM-enhanced communication framework for this robot and evaluated it in several lab experiments, using gaze tracking and motion capture to quantify how users perceive the robot and measure the task progress.

Paper Structure

This paper contains 7 sections, 2 figures.

Figures (2)

  • Figure 1: Focus points and methods in our HRI Studies: (1) Anthropomorphic Communication Proxy for non-humanoid platforms (2) Multimodal and LLM-enhanced communication (3) Gaze tracking and motion capture (4) Controlled user studies
  • Figure 2: Heatmaps showing participant gaze distribution on two robot platforms (including the ARMoD) for two interaction styles (verbal-only and multimodal). In the multimodal style, eye fixations are more concentrated on the ARMoD humanoid robot.