Table of Contents
Fetching ...

CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial Intelligence

Jingyu Shi, Rahul Jain, Seungguen Chi, Hyungjun Doh, Hyunggun Chi, Alexander J. Quinn, Karthik Ramani

TL;DR

CARING-AI introduces a context-aware AR authoring system that leverages Generative AI to create humanoid avatar instructions grounded in real environments. By combining a two-dimensional design space (context: spatial/temporal; content: local/global) with a three-stage workflow—textual instruction refinement, environment grounding, and motion generation via diffusion models—the system delivers scalable, contextually blended AR guidance. Evaluation across a quantitative baseline and two user studies demonstrates improved temporal continuity, spatial accuracy, and usability compared with a PbD approach, while also highlighting limitations in hand-object interactions and generalizability. The work advances practical AR instruction authoring by enabling remote and ad hoc content creation with reduced hardware demands and demonstrates a path toward broader AI-generated modalities in AR guidance.

Abstract

Context-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study with six AR practitioners revealed that the current AIGC lacks contextual information to adapt to varying application scenarios and is therefore limited in authoring. To utilize the strong generative power of GenAI to ease the authoring of AR instruction while capturing the context, we developed CARING-AI, an AR system to author context-aware humanoid-avatar-based instructions with GenAI. By navigating in the environment, users naturally provide contextual information to generate humanoid-avatar animation as AR instructions that blend in the context spatially and temporally. We showcased three application scenarios of CARING-AI: Asynchronous Instructions, Remote Instructions, and Ad Hoc Instructions based on a design space of AIGC in AR Instructions. With two user studies (N=12), we assessed the system usability of CARING-AI and demonstrated the easiness and effectiveness of authoring with Gen-AI.

CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial Intelligence

TL;DR

CARING-AI introduces a context-aware AR authoring system that leverages Generative AI to create humanoid avatar instructions grounded in real environments. By combining a two-dimensional design space (context: spatial/temporal; content: local/global) with a three-stage workflow—textual instruction refinement, environment grounding, and motion generation via diffusion models—the system delivers scalable, contextually blended AR guidance. Evaluation across a quantitative baseline and two user studies demonstrates improved temporal continuity, spatial accuracy, and usability compared with a PbD approach, while also highlighting limitations in hand-object interactions and generalizability. The work advances practical AR instruction authoring by enabling remote and ad hoc content creation with reduced hardware demands and demonstrates a path toward broader AI-generated modalities in AR guidance.

Abstract

Context-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study with six AR practitioners revealed that the current AIGC lacks contextual information to adapt to varying application scenarios and is therefore limited in authoring. To utilize the strong generative power of GenAI to ease the authoring of AR instruction while capturing the context, we developed CARING-AI, an AR system to author context-aware humanoid-avatar-based instructions with GenAI. By navigating in the environment, users naturally provide contextual information to generate humanoid-avatar animation as AR instructions that blend in the context spatially and temporally. We showcased three application scenarios of CARING-AI: Asynchronous Instructions, Remote Instructions, and Ad Hoc Instructions based on a design space of AIGC in AR Instructions. With two user studies (N=12), we assessed the system usability of CARING-AI and demonstrated the easiness and effectiveness of authoring with Gen-AI.

Paper Structure

This paper contains 51 sections, 4 equations, 21 figures, 2 tables, 1 algorithm.

Figures (21)

  • Figure 1: Problems of AI-generated humanoid avatar animation identified in the preliminary study (a) the offset between the generated content and the context, i.e. the interaction is not spatially aligned with the object, (b) the temporal inconsistency, i.e. the generated motion is not temporally connected, and (c) the unfitting visualization extend, i.e. the generated avatars are not of the best scale to convey the instructions (full-body v.s. half-body v.s. hand-only)
  • Figure 2: Our consideration of the design space of AIGC in AR instructions is composed of two dimensions: context and content. An AR instruction can be either temporal or spatial based on the contextual information it conveys, either local or global, based on the scale of the content it contains.
  • Figure 3: The overall pipeline of the CARING-AI system. Users start by generating textual instructions by speech or text. These instructions will be further grounded in the context of the users by scanning the environment. With context, instructions are used to generate humanoid avatar motion to demonstrate the instructions, blended in AR.
  • Figure 4: Our methodology for obtaining the contextual information. For global information, users walk from one location to another to provide trajectories (a). For spatial information, users look at the local objects and take screenshots (b, c). This contextual information will be used to generate humanoid avatar motions that are aware of the spatial context for global and local content.
  • Figure 5: Some examples of our motion generation models. The motion can be local (a) or global (b, c, d, i.e. from one place to another)
  • ...and 16 more figures