Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation
Vivek Myers, Bill Chunyuan Zheng, Oier Mees, Sergey Levine, Kuan Fang
TL;DR
This work tackles the problem of adapting language-conditioned robot policies to unseen, long-horizon tasks from only a few demonstrations. It introduces Policy Adaptation via Language Optimization (PALO), which uses vision-language models to decompose high-level instructions into subtasks and jointly optimize the decomposition with trajectory partitions to enable rapid nonparametric adaptation without large fine-tuning. The approach is supported by regret analysis that decomposes out-of-distribution performance into the pretraining policy’s in-distribution error and the VLM’s decomposition accuracy, plus sampling-related terms. Empirically, PALO achieves strong performance on real-world BridgeDataV2 tasks, outperforming zero-shot and finetuned baselines across multiple scenes and demonstrating robust long-horizon behavior with as few as five demonstrations, underscoring the practical value of semantic task structure for robotic adaptation.
Abstract
Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions. We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition provided by vision-language models (VLMs). Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions sampled from a VLM to quickly enable rapid nonparametric adaptation, avoiding the need for a larger fine-tuning dataset. We evaluate PALO on extensive real-world experiments consisting of challenging unseen, long-horizon robot manipulation tasks. We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies, and methods that have access to the same demonstrations.
