Generative Artificial Intelligence in Robotic Manipulation: A Survey

Kun Zhang; Peng Yun; Jun Cen; Junhao Cai; Didi Zhu; Hangjie Yuan; Chao Zhao; Tao Feng; Michael Yu Wang; Qifeng Chen; Jia Pan; Wei Zhang; Bo Yang; Hua Chen

Generative Artificial Intelligence in Robotic Manipulation: A Survey

Kun Zhang, Peng Yun, Jun Cen, Junhao Cai, Didi Zhu, Hangjie Yuan, Chao Zhao, Tao Feng, Michael Yu Wang, Qifeng Chen, Jia Pan, Wei Zhang, Bo Yang, Hua Chen

TL;DR

The survey addresses data efficiency, long-horizon planning, and cross-environment generalization in robotic manipulation by systematically reviewing how generative learning models can generate data, model world dynamics, and synthesize policies. It introduces a three-layer taxonomy—Foundation (data/reward generation), Intermediate (language/code/visual/state generation), and Policy (grasp/trajectory generation)—to organize a broad spectrum of methods including GANs, VAEs, diffusion models, probabilistic flows, and autoregressive models. The work compiles representative methods and discusses challenges such as data scarcity, sim-to-real transfer, benchmark fragmentation, and physical-law awareness, while outlining concrete directions like domain grounding, unified benchmarks, and physics-informed learning. The practical impact lies in guiding researchers toward scalable data pipelines, multi-modal policy learning, and robust, generalizable robotic manipulation systems across real-world environments.

Abstract

This survey provides a comprehensive review on recent advancements of generative learning models in robotic manipulation, addressing key challenges in the field. Robotic manipulation faces critical bottlenecks, including significant challenges in insufficient data and inefficient data acquisition, long-horizon and complex task planning, and the multi-modality reasoning ability for robust policy learning performance across diverse environments. To tackle these challenges, this survey introduces several generative model paradigms, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), diffusion models, probabilistic flow models, and autoregressive models, highlighting their strengths and limitations. The applications of these models are categorized into three hierarchical layers: the Foundation Layer, focusing on data generation and reward generation; the Intermediate Layer, covering language, code, visual, and state generation; and the Policy Layer, emphasizing grasp generation and trajectory generation. Each layer is explored in detail, along with notable works that have advanced the state of the art. Finally, the survey outlines future research directions and challenges, emphasizing the need for improved efficiency in data utilization, better handling of long-horizon tasks, and enhanced generalization across diverse robotic scenarios. All the related resources, including research papers, open-source data, and projects, are collected for the community in https://github.com/GAI4Manipulation/AwesomeGAIManipulation

Generative Artificial Intelligence in Robotic Manipulation: A Survey

TL;DR

Abstract

Generative Artificial Intelligence in Robotic Manipulation: A Survey

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)