Table of Contents
Fetching ...

AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments

Tomislav Duricic, Peter Müllner, Nicole Weidinger, Neven ElSayed, Dominik Kowald, Eduardo Veas

TL;DR

This work tackles the challenge of training industrial operators to perform complex tasks safely and efficiently when expert availability is limited. It proposes a VR-based digital twin (juice mixer) and an LLM-powered immersive assistant that leverages transcripts from expert performances to deliver context-aware, step-by-step guidance in real time. The approach demonstrates how multimodal inputs (text, audio, video) processed by a GPT-4-based assistant can reduce cognitive load and boost learning efficacy in a controlled industrial setting. The work lays groundwork for scalable, immersive training across industrial environments and highlights future avenues like multimodal embeddings, physiology-informed interfaces, and hybrid theory-data driven guidance.

Abstract

Many industrial sectors rely on well-trained employees that are able to operate complex machinery. In this work, we demonstrate an AI-powered immersive assistance system that supports users in performing complex tasks in industrial environments. Specifically, our system leverages a VR environment that resembles a juice mixer setup. This digital twin of a physical setup simulates complex industrial machinery used to mix preparations or liquids (e.g., similar to the pharmaceutical industry) and includes various containers, sensors, pumps, and flow controllers. This setup demonstrates our system's capabilities in a controlled environment while acting as a proof-of-concept for broader industrial applications. The core components of our multimodal AI assistant are a large language model and a speech-to-text model that process a video and audio recording of an expert performing the task in a VR environment. The video and speech input extracted from the expert's video enables it to provide step-by-step guidance to support users in executing complex tasks. This demonstration showcases the potential of our AI-powered assistant to reduce cognitive load, increase productivity, and enhance safety in industrial environments.

AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments

TL;DR

This work tackles the challenge of training industrial operators to perform complex tasks safely and efficiently when expert availability is limited. It proposes a VR-based digital twin (juice mixer) and an LLM-powered immersive assistant that leverages transcripts from expert performances to deliver context-aware, step-by-step guidance in real time. The approach demonstrates how multimodal inputs (text, audio, video) processed by a GPT-4-based assistant can reduce cognitive load and boost learning efficacy in a controlled industrial setting. The work lays groundwork for scalable, immersive training across industrial environments and highlights future avenues like multimodal embeddings, physiology-informed interfaces, and hybrid theory-data driven guidance.

Abstract

Many industrial sectors rely on well-trained employees that are able to operate complex machinery. In this work, we demonstrate an AI-powered immersive assistance system that supports users in performing complex tasks in industrial environments. Specifically, our system leverages a VR environment that resembles a juice mixer setup. This digital twin of a physical setup simulates complex industrial machinery used to mix preparations or liquids (e.g., similar to the pharmaceutical industry) and includes various containers, sensors, pumps, and flow controllers. This setup demonstrates our system's capabilities in a controlled environment while acting as a proof-of-concept for broader industrial applications. The core components of our multimodal AI assistant are a large language model and a speech-to-text model that process a video and audio recording of an expert performing the task in a VR environment. The video and speech input extracted from the expert's video enables it to provide step-by-step guidance to support users in executing complex tasks. This demonstration showcases the potential of our AI-powered assistant to reduce cognitive load, increase productivity, and enhance safety in industrial environments.
Paper Structure (4 sections, 2 figures)

This paper contains 4 sections, 2 figures.

Figures (2)

  • Figure 1: Overview of the virtual juice mixing setup in VR. Key components are highlighted: (1) Juice Mixer, (2) Juice Station, (3) Spare Part Station, and (4) Controller/Hands as input, which illustrates the user interaction within the immersive environment.
  • Figure 2: System-level (left) and user-level (right) perspective of the immersive AI assistant. The assistant needs an expert to perform the task, and the expert's narration is transcribed to text, which serves as context for the LLM. Given this context and text or speech input from the user, the LLM generates multimodal instructions that guide the user through the task. These instructions are presented to the user within a VR environment with media controls, text command input, and voice interaction to facilitate user engagement with the AI Assistant.