A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models
Nils Ingelhag, Jesper Munkeby, Jonne van Haastregt, Anastasia Varava, Michael C. Welle, Danica Kragic
TL;DR
The paper tackles scalable robotic skill learning in long-tail manipulation tasks by presenting the Robotic Skill Learning System (RSLS), which combines diffusion-based visuomotor policies for skill execution with foundation-model–driven skill selection and precondition validation. New skills are acquired through teleoperated demonstrations (approximately 50–150 per skill) and added to a growing skill library, enabling continuous expansion. The authors evaluate RSLS in both simulated and real-world food-serving scenarios, comparing two leading foundation models (GPT-4 and Gemini) for skill matching and precondition checks, and demonstrate substantial performance gains when integrating LLM and VLM components. The work demonstrates practical impact by enabling robots to learn new tasks with modest demonstration data and by validating the end-to-end framework across multiple environments, with public results and videos available online.
Abstract
In this paper, we build upon two major recent developments in the field, Diffusion Policies for visuomotor manipulation and large pre-trained multimodal foundational models to obtain a robotic skill learning system. The system can obtain new skills via the behavioral cloning approach of visuomotor diffusion policies given teleoperated demonstrations. Foundational models are being used to perform skill selection given the user's prompt in natural language. Before executing a skill the foundational model performs a precondition check given an observation of the workspace. We compare the performance of different foundational models to this end as well as give a detailed experimental evaluation of the skills taught by the user in simulation and the real world. Finally, we showcase the combined system on a challenging food serving scenario in the real world. Videos of all experimental executions, as well as the process of teaching new skills in simulation and the real world, are available on the project's website.
