Table of Contents
Fetching ...

Nebula: A discourse aware Minecraft Builder

Akshay Chaturvedi, Kate Thompson, Nicholas Asher

TL;DR

The paper tackles the challenge of language-to-action in collaborative tasks by incorporating discourse structure and nonlinguistic context. It proposes Nebula, a Llama-based model fine-tuned on the Minecraft Dialogue Corpus, achieving a net-action F1 around $0.39$ on the MDC test, roughly doubling the prior baseline of $0.20$. By leveraging Narrative arcs from the MSDC, the authors show that arc-context can be sufficient for accurate action prediction within an arc, while also revealing limitations of the existing net-action F1 metric for underspecified instructions. To address these issues, they introduce synthetic datasets (level-1/level-2) and demonstrate that targeted fine-tuning improves shape and location understanding, and they propose a more realistic evaluation approach. Overall, the work demonstrates that discourse-aware LLMs can better map complex, underspecified instructions to action sequences in embodied environments and provides guidance on metric design for such evaluations.

Abstract

When engaging in collaborative tasks, humans efficiently exploit the semantic structure of a conversation to optimize verbal and nonverbal interactions. But in recent "language to code" or "language to action" models, this information is lacking. We show how incorporating the prior discourse and nonlinguistic context of a conversation situated in a nonlinguistic environment can improve the "language to action" component of such interactions. We finetune an LLM to predict actions based on prior context; our model, Nebula, doubles the net-action F1 score over the baseline on this task of Jayannavar et al.(2020). We also investigate our model's ability to construct shapes and understand location descriptions using a synthetic dataset

Nebula: A discourse aware Minecraft Builder

TL;DR

The paper tackles the challenge of language-to-action in collaborative tasks by incorporating discourse structure and nonlinguistic context. It proposes Nebula, a Llama-based model fine-tuned on the Minecraft Dialogue Corpus, achieving a net-action F1 around on the MDC test, roughly doubling the prior baseline of . By leveraging Narrative arcs from the MSDC, the authors show that arc-context can be sufficient for accurate action prediction within an arc, while also revealing limitations of the existing net-action F1 metric for underspecified instructions. To address these issues, they introduce synthetic datasets (level-1/level-2) and demonstrate that targeted fine-tuning improves shape and location understanding, and they propose a more realistic evaluation approach. Overall, the work demonstrates that discourse-aware LLMs can better map complex, underspecified instructions to action sequences in embodied environments and provides guidance on metric design for such evaluations.

Abstract

When engaging in collaborative tasks, humans efficiently exploit the semantic structure of a conversation to optimize verbal and nonverbal interactions. But in recent "language to code" or "language to action" models, this information is lacking. We show how incorporating the prior discourse and nonlinguistic context of a conversation situated in a nonlinguistic environment can improve the "language to action" component of such interactions. We finetune an LLM to predict actions based on prior context; our model, Nebula, doubles the net-action F1 score over the baseline on this task of Jayannavar et al.(2020). We also investigate our model's ability to construct shapes and understand location descriptions using a synthetic dataset
Paper Structure (9 sections, 3 figures, 10 tables)

This paper contains 9 sections, 3 figures, 10 tables.

Figures (3)

  • Figure 1: The Neural Builder jayannavar:etal:2020 takes as input the sequence $i\_n$$a\_n$$i\_n+1$ and the worldstate to predict the subsequent action sequence.
  • Figure 2: An excerpt from MDC. The Builder interrupts the action sequence by asking a question.
  • Figure 3: Excerpt of a Narrative arc from the MSDC. Here the arc is purple and connects the instruction in Architect turn one to the following instruction in turn five.