Table of Contents
Fetching ...

A Survey of Robotic Language Grounding: Tradeoffs between Symbols and Embeddings

Vanya Cohen, Jason Xinyu Liu, Raymond Mooney, Stefanie Tellex, David Watkins

TL;DR

The paper addresses robotic language grounding by comparing formal symbol-grounding approaches with end-to-end high-dimensional embedding methods. It systematically surveys how natural language is grounded to formal representations (logics, PDDL, code, predefined skills) and to high-dimensional vectors (image/subgoal and end-effector/joint-state goals), analyzing the tradeoffs in data efficiency, interpretability, and generalization. The contribution lies in organizing the literature along a spectrum, summarizing methods, datasets, and safety considerations, and outlining directions to combine strengths of both ends. This synthesis informs researchers and practitioners about when to favor structure and guarantees versus data-driven flexibility, and highlights practical implications for real-world robot instruction following and safety.

Abstract

With large language models, robots can understand language more flexibly and more capable than ever before. This survey reviews and situates recent literature into a spectrum with two poles: 1) mapping between language and some manually defined formal representation of meaning, and 2) mapping between language and high-dimensional vector spaces that translate directly to low-level robot policy. Using a formal representation allows the meaning of the language to be precisely represented, limits the size of the learning problem, and leads to a framework for interpretability and formal safety guarantees. Methods that embed language and perceptual data into high-dimensional spaces avoid this manually specified symbolic structure and thus have the potential to be more general when fed enough data but require more data and computing to train. We discuss the benefits and tradeoffs of each approach and finish by providing directions for future work that achieves the best of both worlds.

A Survey of Robotic Language Grounding: Tradeoffs between Symbols and Embeddings

TL;DR

The paper addresses robotic language grounding by comparing formal symbol-grounding approaches with end-to-end high-dimensional embedding methods. It systematically surveys how natural language is grounded to formal representations (logics, PDDL, code, predefined skills) and to high-dimensional vectors (image/subgoal and end-effector/joint-state goals), analyzing the tradeoffs in data efficiency, interpretability, and generalization. The contribution lies in organizing the literature along a spectrum, summarizing methods, datasets, and safety considerations, and outlining directions to combine strengths of both ends. This synthesis informs researchers and practitioners about when to favor structure and guarantees versus data-driven flexibility, and highlights practical implications for real-world robot instruction following and safety.

Abstract

With large language models, robots can understand language more flexibly and more capable than ever before. This survey reviews and situates recent literature into a spectrum with two poles: 1) mapping between language and some manually defined formal representation of meaning, and 2) mapping between language and high-dimensional vector spaces that translate directly to low-level robot policy. Using a formal representation allows the meaning of the language to be precisely represented, limits the size of the learning problem, and leads to a framework for interpretability and formal safety guarantees. Methods that embed language and perceptual data into high-dimensional spaces avoid this manually specified symbolic structure and thus have the potential to be more general when fed enough data but require more data and computing to train. We discuss the benefits and tradeoffs of each approach and finish by providing directions for future work that achieves the best of both worlds.
Paper Structure (17 sections, 1 figure, 2 tables)

This paper contains 17 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Approaches to representing natural language for robotics fall along a spectrum from more symbol-like representations to more continuous embedding-like representations. However, most approaches use a mixture of both. SayCan uses a fixed ontology of predefined skills but implements these as neural value functions conditioned on language.