Table of Contents
Fetching ...

Generative Augmented Reality: Paradigms, Technologies, and Future Applications

Chen Liang, Jiawen Zheng, Yufeng Zeng, Yi Tan, Hengye Lyu, Yuhui Zheng, Zisu Li, Yueting Weng, Jiaxin Shi, Hanwang Zhang

TL;DR

Generative Augmented Reality (GAR) reframes augmentation from a multi-stage, asset-driven AR pipeline to a unified, world-re-synthesis process powered by a single generative backbone. By conditioning a streaming video model on environmental observations and interaction signals, GAR achieves temporally coherent augmentation with implicit memory and memory-dependent assets, enabling higher fidelity, richer interaction, and co-adaptive mediation. The paper surveys the technical foundations—ranging from variational and adversarial latent models to diffusion/flow matching, autoregressive streaming, and efficiency techniques—and outlines a landscape of prospective applications across tools, commerce, culture, gaming, and education, while addressing societal and ethical considerations. Together, it argues that GAR can transform AR into a living, co-evolving medium where perception, action, and environment are continuously authored by humans and generative systems alike, with open challenges in scalability, control, and content ecosystems.

Abstract

This paper introduces Generative Augmented Reality (GAR) as a next-generation paradigm that reframes augmentation as a process of world re-synthesis rather than world composition by a conventional AR engine. GAR replaces the conventional AR engine's multi-stage modules with a unified generative backbone, where environmental sensing, virtual content, and interaction signals are jointly encoded as conditioning inputs for continuous video generation. We formalize the computational correspondence between AR and GAR, survey the technical foundations that make real-time generative augmentation feasible, and outline prospective applications that leverage its unified inference model. We envision GAR as a future AR paradigm that delivers high-fidelity experiences in terms of realism, interactivity, and immersion, while eliciting new research challenges on technologies, content ecosystems, and the ethical and societal implications.

Generative Augmented Reality: Paradigms, Technologies, and Future Applications

TL;DR

Generative Augmented Reality (GAR) reframes augmentation from a multi-stage, asset-driven AR pipeline to a unified, world-re-synthesis process powered by a single generative backbone. By conditioning a streaming video model on environmental observations and interaction signals, GAR achieves temporally coherent augmentation with implicit memory and memory-dependent assets, enabling higher fidelity, richer interaction, and co-adaptive mediation. The paper surveys the technical foundations—ranging from variational and adversarial latent models to diffusion/flow matching, autoregressive streaming, and efficiency techniques—and outlines a landscape of prospective applications across tools, commerce, culture, gaming, and education, while addressing societal and ethical considerations. Together, it argues that GAR can transform AR into a living, co-evolving medium where perception, action, and environment are continuously authored by humans and generative systems alike, with open challenges in scalability, control, and content ecosystems.

Abstract

This paper introduces Generative Augmented Reality (GAR) as a next-generation paradigm that reframes augmentation as a process of world re-synthesis rather than world composition by a conventional AR engine. GAR replaces the conventional AR engine's multi-stage modules with a unified generative backbone, where environmental sensing, virtual content, and interaction signals are jointly encoded as conditioning inputs for continuous video generation. We formalize the computational correspondence between AR and GAR, survey the technical foundations that make real-time generative augmentation feasible, and outline prospective applications that leverage its unified inference model. We envision GAR as a future AR paradigm that delivers high-fidelity experiences in terms of realism, interactivity, and immersion, while eliciting new research challenges on technologies, content ecosystems, and the ethical and societal implications.

Paper Structure

This paper contains 52 sections, 8 equations, 3 figures.

Figures (3)

  • Figure 1: Conceptual illustration of Generative Augmented Reality (GAR). Traditional AR relies on rule-based spatial overlays and predefined assets, whereas GAR integrates environmental cues into a unified generative process that re-synthesizes the visual scene in real time. This enables adaptive narratives and continuous, interactive visual augmentation aligned with the physical world.
  • Figure 2: GAR technical foundations, mapped as an evolutionary tree from 2022 to 2025 as five branches: Autoregression, Efficiency, Infinite Length, Multimodal Control, and Scene/Asset—organize prior work and its lineage.
  • Figure 3: AR Application Landscape. The application landscape of Augmented Reality extends across five domains—Tool, Commerce, Lifestyle, Gaming, and Education—organized along a vertical axis of representational depth ranging from textual representation to generative adaptation.