Growing from Exploration: A self-exploring framework for robots based on foundation models
Shoujie Li, Ran Yu, Tong Wu, JunWen Zhong, Xiao-Ping Zhang, Wenbo Ding
TL;DR
GExp presents an infant-inspired framework that enables robots to autonomously explore and learn in unseen environments by leveraging foundation models, notably vision-language models for perception and large language models for planning and control. The approach builds a growing skill library through self-generated tasks, verification, and reflection, while a backtracking control loop enhances robustness during deployment. Experimental validation spans BLOCKS WORLD, RLBench everyday tasks, and real-world deployment, showing improved task success and the ability to transfer skills to new objects and instructions. This framework advances robotic autonomy by reducing human input and enabling continual skill acquisition, with implications for embodied assistants and adaptive manipulation in diverse settings.
Abstract
Intelligent robot is the ultimate goal in the robotics field. Existing works leverage learning-based or optimization-based methods to accomplish human-defined tasks. However, the challenge of enabling robots to explore various environments autonomously remains unresolved. In this work, we propose a framework named GExp, which enables robots to explore and learn autonomously without human intervention. To achieve this goal, we devise modules including self-exploration, knowledge-base-building, and close-loop feedback based on foundation models. Inspired by the way that infants interact with the world, GExp encourages robots to understand and explore the environment with a series of self-generated tasks. During the process of exploration, the robot will acquire skills from beneficial experiences that are useful in the future. GExp provides robots with the ability to solve complex tasks through self-exploration. GExp work is independent of prior interactive knowledge and human intervention, allowing it to adapt directly to different scenarios, unlike previous studies that provided in-context examples as few-shot learning. In addition, we propose a workflow of deploying the real-world robot system with self-learned skills as an embodied assistant.
