Sample-Efficient Behavior Cloning Using General Domain Knowledge
Feiyu Zhu, Jean Oh, Reid Simmons
TL;DR
The paper tackles sample-inefficient behavior cloning by embedding expert-domain knowledge into the policy itself. It introduces Knowledge Informed Models (KIM), where an LLM generates a task-specific, semantically meaningful policy structure from natural-language knowledge, which is then tuned with demonstrations using BC. Across Lunar Lander and Car Racing, KIM achieves strong performance with very few demonstrations and shows robustness to action noise, outperforming unstructured baselines with far more parameters. The work demonstrates that leveraging domain knowledge to shape model structure can markedly improve data efficiency and resilience in sequential decision-making tasks.
Abstract
Behavior cloning has shown success in many sequential decision-making tasks by learning from expert demonstrations, yet they can be very sample inefficient and fail to generalize to unseen scenarios. One approach to these problems is to introduce general domain knowledge, such that the policy can focus on the essential features and may generalize to unseen states by applying that knowledge. Although this knowledge is easy to acquire from the experts, it is hard to be combined with learning from individual examples due to the lack of semantic structure in neural networks and the time-consuming nature of feature engineering. To enable learning from both general knowledge and specific demonstration trajectories, we use a large language model's coding capability to instantiate a policy structure based on expert domain knowledge expressed in natural language and tune the parameters in the policy with demonstrations. We name this approach the Knowledge Informed Model (KIM) as the structure reflects the semantics of expert knowledge. In our experiments with lunar lander and car racing tasks, our approach learns to solve the tasks with as few as 5 demonstrations and is robust to action noise, outperforming the baseline model without domain knowledge. This indicates that with the help of large language models, we can incorporate domain knowledge into the structure of the policy, increasing sample efficiency for behavior cloning.
