Chain-of-Thought Predictive Control
Zhiwei Jia, Vineet Thumuluri, Fangchen Liu, Linghao Chen, Zhiao Huang, Hao Su
TL;DR
CoTPC tackles the challenge of learning generalizable policies for low-level robotic control from sub-optimal demonstrations. It combines unsupervised subskill discovery to extract chain-of-thought sequences and a Transformer with learnable CoT prompts to jointly predict subskills and actions, equipped with a hybrid masking scheme for dynamic guidance. Across Moving Maze, Franka Kitchen, and ManiSkill2, CoTPC consistently outperforms strong baselines and ablations validate the benefits of coupled subskill-action predictions and CoT supervision. This work advances offline imitation learning by leveraging hierarchical planning signals without requiring optimal demos, enabling better transfer to varied tasks and environments.
Abstract
We study generalizable policy learning from demonstrations for complex low-level control (e.g., contact-rich object manipulations). We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos. Firstly, we propose an observation space-agnostic approach that efficiently discovers the multi-step subskill decomposition of the demos in an unsupervised manner. By grouping temporarily close and functionally similar actions into subskill-level demo segments, the observations at the segment boundaries constitute a chain of planning steps for the task, which we refer to as the chain-of-thought (CoT). Next, we propose a Transformer-based design that effectively learns to predict the CoT as the subskill-level guidance. We couple action and subskill predictions via learnable prompt tokens and a hybrid masking strategy, which enable dynamically updated guidance at test time and improve feature representation of the trajectory for generalizable policy learning. Our method, Chain-of-Thought Predictive Control (CoTPC), consistently surpasses existing strong baselines on challenging manipulation tasks with sub-optimal demos.
