BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
Max Sobol Mark, Jacky Liang, Maria Attarian, Chuyuan Fu, Debidatta Dwibedi, Dhruv Shah, Aviral Kumar
TL;DR
This work tackles the challenge of long-horizon, history-dependent imitation learning in robotics by identifying history coverage as the core bottleneck. It introduces Big Picture Policies (BPP), which compress histories into a small set of semantically meaningful keyframes detected by vision-language models, thereby reducing distribution shift between training and deployment. Across four real-world manipulation tasks and three simulations, BPP outperforms memoryless and prior history-conditioned baselines by up to 70% in real-world success, demonstrating improved data efficiency and robust long-horizon tracking. Limitations include dependence on VLM latency and detection accuracy, suggesting future directions toward automatic keyframe generation and event-based learning extensions.
Abstract
Many robot tasks require attending to the history of past observations. For example, finding an item in a room requires remembering which places have already been searched. However, the best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks. Naively conditioning on past observations often fails due to spurious correlations: policies latch onto incidental features of training histories that do not generalize to out-of-distribution trajectories upon deployment. We analyze why policies latch onto these spurious correlations and find that this problem stems from limited coverage over the space of possible histories during training, which grows exponentially with horizon. Existing regularization techniques provide inconsistent benefits across tasks, as they do not fundamentally address this coverage problem. Motivated by these findings, we propose Big Picture Policies (BPP), an approach that conditions on a minimal set of meaningful keyframes detected by a vision-language model. By projecting diverse rollouts onto a compact set of task-relevant events, BPP substantially reduces distribution shift between training and deployment, without sacrificing expressivity. We evaluate BPP on four challenging real-world manipulation tasks and three simulation tasks, all requiring history conditioning. BPP achieves 70% higher success rates than the best comparison on real-world evaluations. Videos are available at https://bigpicturepolicies.github.io/
