Generative Augmented Reality: Paradigms, Technologies, and Future Applications
Chen Liang, Jiawen Zheng, Yufeng Zeng, Yi Tan, Hengye Lyu, Yuhui Zheng, Zisu Li, Yueting Weng, Jiaxin Shi, Hanwang Zhang
TL;DR
Generative Augmented Reality (GAR) reframes augmentation from a multi-stage, asset-driven AR pipeline to a unified, world-re-synthesis process powered by a single generative backbone. By conditioning a streaming video model on environmental observations and interaction signals, GAR achieves temporally coherent augmentation with implicit memory and memory-dependent assets, enabling higher fidelity, richer interaction, and co-adaptive mediation. The paper surveys the technical foundations—ranging from variational and adversarial latent models to diffusion/flow matching, autoregressive streaming, and efficiency techniques—and outlines a landscape of prospective applications across tools, commerce, culture, gaming, and education, while addressing societal and ethical considerations. Together, it argues that GAR can transform AR into a living, co-evolving medium where perception, action, and environment are continuously authored by humans and generative systems alike, with open challenges in scalability, control, and content ecosystems.
Abstract
This paper introduces Generative Augmented Reality (GAR) as a next-generation paradigm that reframes augmentation as a process of world re-synthesis rather than world composition by a conventional AR engine. GAR replaces the conventional AR engine's multi-stage modules with a unified generative backbone, where environmental sensing, virtual content, and interaction signals are jointly encoded as conditioning inputs for continuous video generation. We formalize the computational correspondence between AR and GAR, survey the technical foundations that make real-time generative augmentation feasible, and outline prospective applications that leverage its unified inference model. We envision GAR as a future AR paradigm that delivers high-fidelity experiences in terms of realism, interactivity, and immersion, while eliciting new research challenges on technologies, content ecosystems, and the ethical and societal implications.
