A Survey of Embodied Learning for Object-Centric Robotic Manipulation
Ying Zheng, Lei Yao, Yuejiao Su, Yi Zhang, Yi Wang, Sicheng Zhao, Yiyi Zhang, Lap-Pui Chau
TL;DR
This survey addresses the problem of enabling robots to manipulate objects through embodied learning, by organizing existing work into three interconnected domains: embodied perceptual learning, embodied policy learning, and embodied task-oriented learning. It provides a structured taxonomy across data representations (image-based, 3D-aware, and tactile), object pose estimation (ILOPE, CLOPE, NOPE), and affordance learning, then surveys policy representations (explicit, implicit, diffusion) and policy learning (RL, IL, hybrids) before detailing object grasping and manipulation tasks, datasets, and evaluation metrics. The paper also surveys applications across industrial, agricultural, domestic, and surgical domains, and discusses challenges such as sim-to-real generalization, multimodal embodied LLMs, human-robot collaboration, model compression, and safety, offering future directions. Overall, the work consolidates cutting-edge developments, highlights practical datasets and benchmarks, and provides a roadmap for advancing robust, generalizable embodied robotic manipulation. A linked repository at https://github.com/RayYoh/OCRM_survey accompanies the survey for reproducibility and community engagement.
Abstract
Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in embodied AI. It is crucial for advancing next-generation intelligent robots and has garnered significant interest recently. Unlike data-driven machine learning methods, embodied learning focuses on robot learning through physical interaction with the environment and perceptual feedback, making it especially suitable for robotic manipulation. In this paper, we provide a comprehensive survey of the latest advancements in this field and categorize the existing work into three main branches: 1) Embodied perceptual learning, which aims to predict object pose and affordance through various data representations; 2) Embodied policy learning, which focuses on generating optimal robotic decisions using methods such as reinforcement learning and imitation learning; 3) Embodied task-oriented learning, designed to optimize the robot's performance based on the characteristics of different tasks in object grasping and manipulation. In addition, we offer an overview and discussion of public datasets, evaluation metrics, representative applications, current challenges, and potential future research directions. A project associated with this survey has been established at https://github.com/RayYoh/OCRM_survey.
