Table of Contents
Fetching ...

Satori: Towards Proactive AR Assistant with Belief-Desire-Intention User Modeling

Chenyi Li, Guande Wu, Gromit Yeuk-Yin Chan, Dishita G Turakhia, Sonia Castelo Quispe, Dong Li, Leslie Welch, Claudio Silva, Jing Qian

TL;DR

Satori presents a proactive AR assistant that combines Belief-Desire-Intention (BDI) user modeling with a multimodal large language model to infer user state and environmental context for timely guidance. Derived from two formative studies with 12 experts, Satori features a BDI-informed reasoning pipeline, timing prediction, and dynamic multimodal content generation, validated through a 16-participant study showing non-inferiority to designer-crafted Wizard-of-Oz baselines. Results indicate Satori can deliver comparable guidance in timeliness, comprehensibility, usefulness, and efficacy while improving generalizability and scalability by avoiding extensive domain-specific configurations. The work demonstrates the viability of integrating BDI-based user modeling with LLM-assisted perception and planning to enable scalable, transparent human–AI collaboration in everyday AR tasks.

Abstract

Augmented Reality (AR) assistance is increasingly used for supporting users with physical tasks like assembly and cooking. However, most systems rely on reactive responses triggered by user input, overlooking rich contextual and user-specific information. To address this, we present Satori, a novel AR system that proactively guides users by modeling both -- their mental states and environmental contexts. Satori integrates the Belief-Desire-Intention (BDI) framework with the state-of-the-art multi-modal large language model (LLM) to deliver contextually appropriate guidance. Our system is designed based on two formative studies involving twelve experts. We evaluated the system with a sixteen within-subject study and found that Satori matches the performance of designer-created Wizard-of-Oz (WoZ) systems, without manual configurations or heuristics, thereby improving generalizability, reusability, and expanding the potential of AR assistance.

Satori: Towards Proactive AR Assistant with Belief-Desire-Intention User Modeling

TL;DR

Satori presents a proactive AR assistant that combines Belief-Desire-Intention (BDI) user modeling with a multimodal large language model to infer user state and environmental context for timely guidance. Derived from two formative studies with 12 experts, Satori features a BDI-informed reasoning pipeline, timing prediction, and dynamic multimodal content generation, validated through a 16-participant study showing non-inferiority to designer-crafted Wizard-of-Oz baselines. Results indicate Satori can deliver comparable guidance in timeliness, comprehensibility, usefulness, and efficacy while improving generalizability and scalability by avoiding extensive domain-specific configurations. The work demonstrates the viability of integrating BDI-based user modeling with LLM-assisted perception and planning to enable scalable, transparent human–AI collaboration in everyday AR tasks.

Abstract

Augmented Reality (AR) assistance is increasingly used for supporting users with physical tasks like assembly and cooking. However, most systems rely on reactive responses triggered by user input, overlooking rich contextual and user-specific information. To address this, we present Satori, a novel AR system that proactively guides users by modeling both -- their mental states and environmental contexts. Satori integrates the Belief-Desire-Intention (BDI) framework with the state-of-the-art multi-modal large language model (LLM) to deliver contextually appropriate guidance. Our system is designed based on two formative studies involving twelve experts. We evaluated the system with a sixteen within-subject study and found that Satori matches the performance of designer-created Wizard-of-Oz (WoZ) systems, without manual configurations or heuristics, thereby improving generalizability, reusability, and expanding the potential of AR assistance.

Paper Structure

This paper contains 91 sections, 2 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: During the first session (participatory design), experts need to collaborate on creating an ideal assistant framework based on the presented diagram and modules. At the bottom of Figure (a), the experts can find the system components for perception. Figure (b) is a result of the original diagram illustrated by one expert group.
  • Figure 2: The figure is a system overview of the BDI user model. The system processes inputs from the camera's view, dialogue (voice communication between the user and the GPT model), and the historical logger (records of prior assistance). These inputs are sent to different BDI components for analysis and inference using a combination of local models and LLMs to generate proactive guidance and determine the appropriate modality and assistance timing. To ensure assistance appears and disappears at the right time, a task planner LLM generates a step-by-step task plan based on the inferred desire, with multiple checkpoints assigned to each step. These checkpoints are monitored by the action finish detection module, which determines task completion by verifying checkpoint progress. In addition, the system employs an early forecasting module to minimize latency.
  • Figure 3: Comparison of the naively generated images from the GPT model (i.e., Naive) with our proposed prompts (i.e., Satori). (a) "One hand presses a white button on a white espresso machine. A large red arrow points to the button. No background, in the style of flat, instructional illustrations. Accurate, concise, comfortable color style." (b) "One hand presses a white button on a white espresso machine." (c) "Cut stem of a red flower up from bottom, with white scissors at 45 degrees. One big red arrow pointing to bottom of the flower stem. In the style of flat, instructional illustrations. No background. Accurate, concise, comfortable color style." (d) "Cut stem of a red flower up from bottom with white scissors at 45 degrees."
  • Figure 4: (a) In this example, the user is grinding the coffee beans. The interface shows the task goal as "Making Coffee" and the upcoming action or step as "Grind coffee beans into powder." The action checkpoints marked with green checks indicate the number of sub-steps that are completed. The action checkpoints marked with a blue circle indicate the number of sub-steps that are in progress. Once all sub-steps are checked, the current step is considered complete; and (b) A task assistance confirmation appears when the system detects step completion. The confirmation prompts the user, asking if they are about to use a coffee filter and whether they need assistance.
  • Figure 5: Evaluation tasks using either Satori or a Wizard-of-Oz baseline. (a) The participant is assembling a mop during the room-cleaning task; and (b) The participant is connecting an HDMI cable to a Nintendo Switch dock during the connecting Nintendo Switch task; and (c) The participant is preparing a filter during the coffee-making task; and (d) The participant is trimming flower stems during the flower-arranging task.
  • ...and 2 more figures