Table of Contents
Fetching ...

VAL: Interactive Task Learning with GPT Dialog Parsing

Lane Lawley, Christopher J. MacLellan

TL;DR

VAL tackles the brittleness of natural language interfaces in interactive task learning by integrating a neuro-symbolic HTN-based planning framework with narrowly scoped GPT subroutines. It uses the VALgorithm to ground natural language in symbolic actions, enabling incremental, interpretable task knowledge that generalizes to new tasks with few examples. The approach is validated via a user study in a video game environment, showing usable and interpretable learning with measurable subroutine success and actionable feedback mechanisms like confirmatory dialogs and an undo feature. The work demonstrates practical implications for human-centered AI that can learn from limited natural language guidance while maintaining reliability and interpretability, pointing to future expansion across modalities and open-model deployments.

Abstract

Machine learning often requires millions of examples to produce static, black-box models. In contrast, interactive task learning (ITL) emphasizes incremental knowledge acquisition from limited instruction provided by humans in modalities such as natural language. However, ITL systems often suffer from brittle, error-prone language parsing, which limits their usability. Large language models (LLMs) are resistant to brittleness but are not interpretable and cannot learn incrementally. We present VAL, an ITL system with a new philosophy for LLM/symbolic integration. By using LLMs only for specific tasks--such as predicate and argument selection--within an algorithmic framework, VAL reaps the benefits of LLMs to support interactive learning of hierarchical task knowledge from natural language. Acquired knowledge is human interpretable and generalizes to support execution of novel tasks without additional training. We studied users' interactions with VAL in a video game setting, finding that most users could successfully teach VAL using language they felt was natural.

VAL: Interactive Task Learning with GPT Dialog Parsing

TL;DR

VAL tackles the brittleness of natural language interfaces in interactive task learning by integrating a neuro-symbolic HTN-based planning framework with narrowly scoped GPT subroutines. It uses the VALgorithm to ground natural language in symbolic actions, enabling incremental, interpretable task knowledge that generalizes to new tasks with few examples. The approach is validated via a user study in a video game environment, showing usable and interpretable learning with measurable subroutine success and actionable feedback mechanisms like confirmatory dialogs and an undo feature. The work demonstrates practical implications for human-centered AI that can learn from limited natural language guidance while maintaining reliability and interpretability, pointing to future expansion across modalities and open-model deployments.

Abstract

Machine learning often requires millions of examples to produce static, black-box models. In contrast, interactive task learning (ITL) emphasizes incremental knowledge acquisition from limited instruction provided by humans in modalities such as natural language. However, ITL systems often suffer from brittle, error-prone language parsing, which limits their usability. Large language models (LLMs) are resistant to brittleness but are not interpretable and cannot learn incrementally. We present VAL, an ITL system with a new philosophy for LLM/symbolic integration. By using LLMs only for specific tasks--such as predicate and argument selection--within an algorithmic framework, VAL reaps the benefits of LLMs to support interactive learning of hierarchical task knowledge from natural language. Acquired knowledge is human interpretable and generalizes to support execution of novel tasks without additional training. We studied users' interactions with VAL in a video game setting, finding that most users could successfully teach VAL using language they felt was natural.
Paper Structure (49 sections, 21 figures, 1 table, 1 algorithm)

This paper contains 49 sections, 21 figures, 1 table, 1 algorithm.

Figures (21)

  • Figure 1: An example dialog with VAL. The current dialog state is a confirmatory prompt for the text segmentation step performed by one of VAL's GPT subroutines. This step performs action discretization, anaphora resolution (of "in there"), and temporal ordering.
  • Figure 2: A conversation with the Rosie system rosie2 demonstrating the rigid nature of the interaction. M denotes a user message, and R denotes a message from Rosie.
  • Figure 3: An excerpt of an interaction with VAL to teach the plan shown in Figure \ref{['fig:val_dialog']}. U denotes a user message, and V denotes a VAL message.
  • Figure 4: A high-level diagram of VAL's components: the GPT subroutines (Section \ref{['sec:gpt_subroutine_arch']}), the main "VALgorithm" (Section \ref{['sec:valgorithm']}), the HTN knowledge base (Section \ref{['sec:htns']}), and, outside of VAL, an example environment from our user study (Section \ref{['sec:overcooked']}).
  • Figure 5: An example HTN plan for cooking, learned by VAL in the Overcooked-AI environment. The learned task cook decomposes into other learned tasks, which themselves decompose into primitive actions.
  • ...and 16 more figures