TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur
TL;DR
TEACh introduces a large-scale dataset of human–human, dialogue-guided embodied interactions in AI2-THOR to study how natural language can ground perception and actions for household tasks. It presents an extensible Task Definition Language and three benchmarks—Execution from Dialogue History, Trajectory from Dialogue, and Two-Agent Task Completion—to evaluate Follower-only and two-agent systems. Baseline experiments using an adapted Episodic Transformer reveal strong gains over simple baselines for EDH, but highlight significant challenges posed by long-horizon dialogue grounding and two-agent coordination, with end-to-end success remaining difficult. The work provides a foundation for future few-shot generalization, improved grounding, and human-in-the-loop evaluation for conversationally guided embodied AI in realistic home settings.
Abstract
Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes. To study this, we introduce TEACh, a dataset of over 3,000 human--human, interactive dialogues to complete household tasks in simulation. A Commander with access to oracle information about a task communicates in natural language with a Follower. The Follower navigates through and interacts with the environment to complete tasks varying in complexity from "Make Coffee" to "Prepare Breakfast", asking questions and getting additional information from the Commander. We propose three benchmarks using TEACh to study embodied intelligence challenges, and we evaluate initial models' abilities in dialogue understanding, language grounding, and task execution.
