Improving Knowledge Extraction from LLMs for Task Learning through Agent Analysis
James R. Kirk, Robert E. Wray, Peter Lindes, John E. Laird
TL;DR
The paper tackles the challenge of extracting grounded, task-relevant knowledge from LLMs for embodied agents. It introduces STARS, a cognitive-agent framework that extends template-based prompting with a Search Tree, Analyze and Repair, and Selection pipeline, optionally complemented by user oversight. By generating a breadth of candidate goals, proactively diagnosing and repairing mismatches, and using the LLM to select among viable options, STARS achieves high one-shot task completion (77-94% without oversight) and reaches 100% with minimal human input. The approach demonstrates that an LLM can be effectively leveraged as one component within a broader task-learning system, reducing user burden while grounding goals to embodiment, environment, and user preferences, with potential for context-informed improvements (STARS*).
Abstract
Large language models (LLMs) offer significant promise as a knowledge source for task learning. Prompt engineering has been shown to be effective for eliciting knowledge from an LLM, but alone it is insufficient for acquiring relevant, situationally grounded knowledge for an embodied agent learning novel tasks. We describe a cognitive-agent approach, STARS, that extends and complements prompt engineering, mitigating its limitations and thus enabling an agent to acquire new task knowledge matched to its native language capabilities, embodiment, environment, and user preferences. The STARS approach is to increase the response space of LLMs and deploy general strategies, embedded within the autonomous agent, to evaluate, repair, and select among candidate responses produced by the LLM. We describe the approach and experiments that show how an agent, by retrieving and evaluating a breadth of responses from the LLM, can achieve 77-94% task completion in one-shot learning without user oversight. The approach achieves 100% task completion when human oversight (such as an indication of preference) is provided. Further, the type of oversight largely shifts from explicit, natural language instruction to simple confirmation/discomfirmation of high-quality responses that have been vetted by the agent before presentation to a user.
