SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks
Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel
TL;DR
SpawnNet tackles generalization in visuomotor skills by leveraging pre-trained vision representations through a two-stream architecture that fuses multi-layer ViT features with a learnable perception stream via adapters. The approach alleviates the bottleneck of frozen backbones, enabling robust policy learning for diverse objects in both simulation and real-world tasks. Across Open Door/Open Drawer (simulation) and three real-world manipulation tasks, SpawnNet consistently outperforms frozen and from-scratch baselines, with notable gains from dense spatial features and depth augmentation. The work demonstrates that adaptive fusion of pre-trained features, not mere freezing, yields stronger cross-instance generalization and offers a practical path for deploying generalizable robotic manipulation policies.
Abstract
The existing internet-scale image and video datasets cover a wide range of everyday objects and tasks, bringing the potential of learning policies that generalize in diverse scenarios. Prior works have explored visual pre-training with different self-supervised objectives. Still, the generalization capabilities of the learned policies and the advantages over well-tuned baselines remain unclear from prior studies. In this work, we present a focused study of the generalization capabilities of the pre-trained visual representations at the categorical level. We identify the key bottleneck in using a frozen pre-trained visual backbone for policy learning and then propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy. Through extensive simulated and real experiments, we show significantly better categorical generalization compared to prior approaches in imitation learning settings. Open-sourced code and videos can be found on our website: https://xingyu-lin.github.io/spawnnet.
