Table of Contents
Fetching ...

Investigating the Design Considerations for Integrating Text-to-Image Generative AI within Augmented Reality Environments

Yongquan Hu, Dawen Zhang, Mingyue Yuan, Kaiqi Xian, Don Samitha Elvitigala, June Kim, Gelareh Mohammadi, Zhenchang Xing, Xiwei Xu, Aaron Quigley

TL;DR

This paper investigates how to design and integrate generative AI into augmented reality by building a prototype (GenerativeAIR) that combines two multimodal AI models with three AR display modalities. Through focus-group interviews with ten experts, the authors derive a design-space framework organized around user, function, and environment to guide AIGC+AR development. They report qualitative insights comparing AR displays and AI content generation, highlighting trade-offs in fidelity, privacy, and usability, and propose live-use scenarios and potential applications such as real-time personalized content and multi-user collaboration. The work advances practical guidelines for deploying GenAI in AR, addressing design considerations, interaction modalities, and privacy/permission concerns critical for adoption in real-world settings.

Abstract

Generative Artificial Intelligence (GenAI) has emerged as a fundamental component of intelligent interactive systems, enabling the automatic generation of multimodal media content. The continuous enhancement in the quality of Artificial Intelligence-Generated Content (AIGC), including but not limited to images and text, is forging new paradigms for its application, particularly within the domain of Augmented Reality (AR). Nevertheless, the application of GenAI within the AR design process remains opaque. This paper aims to articulate a design space encapsulating a series of criteria and a prototypical process to aid practitioners in assessing the aptness of adopting pertinent technologies. The proposed model has been formulated based on a synthesis of design insights garnered from ten experts, obtained through focus group interviews. Leveraging these initial insights, we delineate potential applications of GenAI in AR.

Investigating the Design Considerations for Integrating Text-to-Image Generative AI within Augmented Reality Environments

TL;DR

This paper investigates how to design and integrate generative AI into augmented reality by building a prototype (GenerativeAIR) that combines two multimodal AI models with three AR display modalities. Through focus-group interviews with ten experts, the authors derive a design-space framework organized around user, function, and environment to guide AIGC+AR development. They report qualitative insights comparing AR displays and AI content generation, highlighting trade-offs in fidelity, privacy, and usability, and propose live-use scenarios and potential applications such as real-time personalized content and multi-user collaboration. The work advances practical guidelines for deploying GenAI in AR, addressing design considerations, interaction modalities, and privacy/permission concerns critical for adoption in real-world settings.

Abstract

Generative Artificial Intelligence (GenAI) has emerged as a fundamental component of intelligent interactive systems, enabling the automatic generation of multimodal media content. The continuous enhancement in the quality of Artificial Intelligence-Generated Content (AIGC), including but not limited to images and text, is forging new paradigms for its application, particularly within the domain of Augmented Reality (AR). Nevertheless, the application of GenAI within the AR design process remains opaque. This paper aims to articulate a design space encapsulating a series of criteria and a prototypical process to aid practitioners in assessing the aptness of adopting pertinent technologies. The proposed model has been formulated based on a synthesis of design insights garnered from ten experts, obtained through focus group interviews. Leveraging these initial insights, we delineate potential applications of GenAI in AR.
Paper Structure (21 sections, 3 figures)

This paper contains 21 sections, 3 figures.

Figures (3)

  • Figure 1: The workflow and display effect of our GenerativeAIR prototype: (a) The user's speech into the microphone is converted into text, which is then fed into an AI model for generating artistic images and more text; (b) Generated content in Spatial Augmented Reality (SAR): an example of Samsung Freestyle project; (c) Generated content in Head-Mounted Display (HMD): an example of Microsoft HoloLens 2; (d) Generated content in Hand-Held Display (HHD): an example of OnePlus 10 Pro.
  • Figure 2: The comparisons of AR display+generative AI and their related techniques: (a) display performance comparison of AR, VR and normal monitor; (b) content-generation performance comparison of generative AI (machine), AI assist (machine+human) and human.
  • Figure 3: Potential Applications of GenerativeAIR: (a) Boosting real-time creative media generation; (b) Smoothening interactions with surroundings and environment; (c) Facilitating multi-user collaboration.