On Time-Indexing as Inductive Bias in Deep RL for Sequential Manipulation Tasks
M. Nomaan Qureshi, Ben Eisner, David Held
TL;DR
The paper addresses multimodal skill learning in robotic manipulation by proposing a simple time-indexed, multi-head policy in which $k$ heads are activated sequentially for fixed durations $T$, enabling explicit learning of primitive skills such as reaching and grasping. This scheduling-based architecture provides an inductive bias that circumvents the instability of learning multiple sub-skills and switching policies, and is compatible with standard RL algorithms like PPO and SAC. Empirical results on four MetaWorld tasks show improved performance and stability, with notable gains in push-v2, box-close-v2, and bin-picking-v2 where traditional baselines struggle. Overall, the work demonstrates that explicit time-based skill decomposition via neural heads can enhance data efficiency and skill acquisition in sequential manipulation tasks, motivating further exploration of structured policy designs in robotics.
Abstract
While solving complex manipulation tasks, manipulation policies often need to learn a set of diverse skills to accomplish these tasks. The set of skills is often quite multimodal - each one may have a quite distinct distribution of actions and states. Standard deep policy-learning algorithms often model policies as deep neural networks with a single output head (deterministic or stochastic). This structure requires the network to learn to switch between modes internally, which can lead to lower sample efficiency and poor performance. In this paper we explore a simple structure which is conducive to skill learning required for so many of the manipulation tasks. Specifically, we propose a policy architecture that sequentially executes different action heads for fixed durations, enabling the learning of primitive skills such as reaching and grasping. Our empirical evaluation on the Metaworld tasks reveals that this simple structure outperforms standard policy learning methods, highlighting its potential for improved skill acquisition.
