Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach
Isaac Sheidlower, Reuben Aronson, Elaine Schaertl Short
TL;DR
Generalist robot policies suffer from poor modularity and cross task interference, hindering interpretability and user personalization. The authors propose Diffusion for Policy Parameters (DPP), a conditional diffusion model that generates standalone task specific policies directly in parameter space given a task specification. They demonstrate a grid world proof of concept trained on language conditioned tasks and a large corpus of policies, showing that a single diffusion sample can produce meaningful policies and that mixing samples can improve performance, thereby decoupling task learning from the foundation model. While promising, the work also discusses limitations around scalability to real robots, the need for both policy and demonstration data, and the challenge of extending to continuous actions. Overall, DPP offers a pathway toward more interpretable usable foundation models by enabling user controlled, task local updates without unintended cross task effects.
Abstract
Foundation models are a promising path toward general-purpose and user-friendly robots. The prevalent approach involves training a generalist policy that, like a reinforcement learning policy, uses observations to output actions. Although this approach has seen much success, several concerns arise when considering deployment and end-user interaction with these systems. In particular, the lack of modularity between tasks means that when model weights are updated (e.g., when a user provides feedback), the behavior in other, unrelated tasks may be affected. This can negatively impact the system's interpretability and usability. We present an alternative approach to the design of robot foundation models, Diffusion for Policy Parameters (DPP), which generates stand-alone, task-specific policies. Since these policies are detached from the foundation model, they are updated only when a user wants, either through feedback or personalization, allowing them to gain a high degree of familiarity with that policy. We demonstrate a proof-of-concept of DPP in simulation then discuss its limitations and the future of interpretable foundation models.
