Nebula: A discourse aware Minecraft Builder

Akshay Chaturvedi; Kate Thompson; Nicholas Asher

Nebula: A discourse aware Minecraft Builder

Akshay Chaturvedi, Kate Thompson, Nicholas Asher

TL;DR

The paper tackles the challenge of language-to-action in collaborative tasks by incorporating discourse structure and nonlinguistic context. It proposes Nebula, a Llama-based model fine-tuned on the Minecraft Dialogue Corpus, achieving a net-action F1 around $0.39$ on the MDC test, roughly doubling the prior baseline of $0.20$. By leveraging Narrative arcs from the MSDC, the authors show that arc-context can be sufficient for accurate action prediction within an arc, while also revealing limitations of the existing net-action F1 metric for underspecified instructions. To address these issues, they introduce synthetic datasets (level-1/level-2) and demonstrate that targeted fine-tuning improves shape and location understanding, and they propose a more realistic evaluation approach. Overall, the work demonstrates that discourse-aware LLMs can better map complex, underspecified instructions to action sequences in embodied environments and provides guidance on metric design for such evaluations.

Abstract

When engaging in collaborative tasks, humans efficiently exploit the semantic structure of a conversation to optimize verbal and nonverbal interactions. But in recent "language to code" or "language to action" models, this information is lacking. We show how incorporating the prior discourse and nonlinguistic context of a conversation situated in a nonlinguistic environment can improve the "language to action" component of such interactions. We finetune an LLM to predict actions based on prior context; our model, Nebula, doubles the net-action F1 score over the baseline on this task of Jayannavar et al.(2020). We also investigate our model's ability to construct shapes and understand location descriptions using a synthetic dataset

Nebula: A discourse aware Minecraft Builder

TL;DR

on the MDC test, roughly doubling the prior baseline of

. By leveraging Narrative arcs from the MSDC, the authors show that arc-context can be sufficient for accurate action prediction within an arc, while also revealing limitations of the existing net-action F1 metric for underspecified instructions. To address these issues, they introduce synthetic datasets (level-1/level-2) and demonstrate that targeted fine-tuning improves shape and location understanding, and they propose a more realistic evaluation approach. Overall, the work demonstrates that discourse-aware LLMs can better map complex, underspecified instructions to action sequences in embodied environments and provides guidance on metric design for such evaluations.

Abstract

Paper Structure (9 sections, 3 figures, 10 tables)

This paper contains 9 sections, 3 figures, 10 tables.

Introduction
Related Work
Nebula: an LLM for Predicting Action Sequences
Using Discourse Structure to Improve Nebula
Problems with the MDC
Evaluating Nebula on Synthetic Data
Finetuning Nebula on Shapes and Locations
Conclusions and Future Work
Appendix

Figures (3)

Figure 1: The Neural Builder jayannavar:etal:2020 takes as input the sequence $i\_n$$a\_n$$i\_n+1$ and the worldstate to predict the subsequent action sequence.
Figure 2: An excerpt from MDC. The Builder interrupts the action sequence by asking a question.
Figure 3: Excerpt of a Narrative arc from the MSDC. Here the arc is purple and connects the instruction in Architect turn one to the following instruction in turn five.

Nebula: A discourse aware Minecraft Builder

TL;DR

Abstract

Nebula: A discourse aware Minecraft Builder

Authors

TL;DR

Abstract

Table of Contents

Figures (3)