Design and Implementation of the Transparent, Interpretable, and Multimodal (TIM) AR Personal Assistant

Erin McGowan; Joao Rulff; Sonia Castelo; Guande Wu; Shaoyu Chen; Roque Lopez; Bea Steers; Iran R. Roman; Fabio F. Dias; Jing Qian; Parikshit Solunke; Michael Middleton; Ryan McKendrick; Claudio T. Silva

Design and Implementation of the Transparent, Interpretable, and Multimodal (TIM) AR Personal Assistant

Erin McGowan, Joao Rulff, Sonia Castelo, Guande Wu, Shaoyu Chen, Roque Lopez, Bea Steers, Iran R. Roman, Fabio F. Dias, Jing Qian, Parikshit Solunke, Michael Middleton, Ryan McKendrick, Claudio T. Silva

TL;DR

TIM introduces a transparent, multimodal AR personal assistant that integrates perception, memory, reasoning, and an adaptive UI to deliver just-in-time task guidance while enabling comprehensive data provenance for post-hoc analysis. The system combines egocentric perception, 3D memory, and two reasoning approaches—a dependency-graph model and a random-forest model using EgoHOS features—to produce interpretable instructions anchored in the 3D environment. Real-time analytics and extensive visualization tools support debugging and retrospective evaluation of model behavior and human performance, including temporal, spatial, and physiological data streams. Domain-calibrated deployments in tactical field care and copilot monitoring demonstrate TIM’s adaptability and show how customized components can address task-specific challenges. Limitations include focus on physical tasks, lighting sensitivity, and multi-performer collaboration, with future work aimed at broader generalization, workload modeling, and expansion to layperson use cases.

Abstract

The concept of an AI assistant for task guidance is rapidly shifting from a science fiction staple to an impending reality. Such a system is inherently complex, requiring models for perceptual grounding, attention, and reasoning, an intuitive interface that adapts to the performer's needs, and the orchestration of data streams from many sensors. Moreover, all data acquired by the system must be readily available for post-hoc analysis to enable developers to understand performer behavior and quickly detect failures. We introduce TIM, the first end-to-end AI-enabled task guidance system in augmented reality which is capable of detecting both the user and scene as well as providing adaptable, just-in-time feedback. We discuss the system challenges and propose design solutions. We also demonstrate how TIM adapts to domain applications with varying needs, highlighting how the system components can be customized for each scenario.

Design and Implementation of the Transparent, Interpretable, and Multimodal (TIM) AR Personal Assistant

TL;DR

Abstract

Design and Implementation of the Transparent, Interpretable, and Multimodal (TIM) AR Personal Assistant

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)