Sample Efficient Robot Learning in Supervised Effect Prediction Tasks
Mehmet Arda Eren, Erhan Oztop
TL;DR
Robots learning action–effect models in self-supervised settings incur high data collection costs; this work addresses this by proposing MUSEL, a model-uncertainty driven AL framework for regression in continuous spaces. MUSEL leverages a Stochastic Variational Deep Kernel Learning backbone to jointly estimate data and model uncertainty, learning progress (LP), and input diversity for sample selection. The main contributions are: (i) a concrete AL algorithm for continuous-state/action regression, (ii) a data uncertainty estimator that uses LP and minimum-distance, and (iii) experimental validation in one- and two-sphere tabletop tasks showing improved learning accuracy and sample efficiency with ablations. The results demonstrate practical gains for sample-efficient world-model learning in robotics and suggest broader applicability to self-supervised policy and control tasks.
Abstract
In self-supervised robotic learning, agents acquire data through active interaction with their environment, incurring costs such as energy use, human oversight, and experimental time. To mitigate these, sample-efficient exploration is essential. While intrinsic motivation (IM) methods like learning progress (LP) are widely used in robotics, and active learning (AL) is well established for classification in machine learning, few frameworks address continuous, high-dimensional regression tasks typical of world model learning. We propose MUSEL (Model Uncertainty for Sample-Efficient Learning), a novel AL framework tailored for regression tasks in robotics, such as action-effect prediction. MUSEL introduces a model uncertainty metric that combines total predictive uncertainty, learning progress, and input diversity to guide data acquisition. We validate our approach using a Stochastic Variational Deep Kernel Learning (SVDKL) model in two robotic tabletop tasks. Experimental results demonstrate that MUSEL improves both learning accuracy and sample efficiency, validating its effectiveness in learning action effects and selecting informative samples.
