Semi-Supervised Neural Processes for Articulated Object Interactions
Emily Liu, Michael Noseworthy, Nicholas Roy
TL;DR
This work tackles the data efficiency challenge in adaptive robotic manipulation by proposing Semi-Supervised Neural Processes (SSNP), which jointly learn from abundant unlabeled visual context and a limited set of labeled interactions. By integrating a context-learner inspired by the Neural Statistician with a Neural Process, SSNP builds object-level and action-specific latent representations to predict interaction rewards while adapting to new objects in a few shots. The approach achieves lower prediction error and faster adaptation than fully supervised or pretrained baselines on a door-opening task, even when only a small fraction of objects carry labels, thereby reducing labeling and computational costs. Practically, SSNP enables robots to leverage passive observations to guide manipulation policies with minimal retraining, improving robustness and data efficiency in real-world settings.
Abstract
The scarcity of labeled action data poses a considerable challenge for developing machine learning algorithms for robotic object manipulation. It is expensive and often infeasible for a robot to interact with many objects. Conversely, visual data of objects, without interaction, is abundantly available and can be leveraged for pretraining and feature extraction. However, current methods that rely on image data for pretraining do not easily adapt to task-specific predictions, since the learned features are not guaranteed to be relevant. This paper introduces the Semi-Supervised Neural Process (SSNP): an adaptive reward-prediction model designed for scenarios in which only a small subset of objects have labeled interaction data. In addition to predicting reward labels, the latent-space of the SSNP is jointly trained with an autoencoding objective using passive data from a much larger set of objects. Jointly training with both types of data allows the model to focus more effectively on generalizable features and minimizes the need for extensive retraining, thereby reducing computational demands. The efficacy of SSNP is demonstrated through a door-opening task, leading to better performance than other semi-supervised methods, and only using a fraction of the data compared to other adaptive models.
