Preference-Conditioned Language-Guided Abstraction

Andi Peng; Andreea Bobu; Belinda Z. Li; Theodore R. Sumers; Ilia Sucholutsky; Nishanth Kumar; Thomas L. Griffiths; Julie A. Shah

Preference-Conditioned Language-Guided Abstraction

Andi Peng, Andreea Bobu, Belinda Z. Li, Theodore R. Sumers, Ilia Sucholutsky, Nishanth Kumar, Thomas L. Griffiths, Julie A. Shah

TL;DR

PLGA addresses the limitation of language-only task specifications by inferring latent human abstraction preferences from demonstrations and behavior changes using language models. It combines LM-based preference inference with a preference-conditioned abstraction function to improve generalization and learning efficiency, including an active human-querying variant for user-specific preferences. In simulations, a user study, and real Spot robot experiments, PLGA improves downstream policy performance and offers a more natural interaction experience than prior LGA-based methods. This work advances preference-aware state abstractions for imitation learning, enabling more flexible and user-aligned robot learning in complex environments.

Abstract

Learning from demonstrations is a common way for users to teach robots, but it is prone to spurious feature correlations. Recent work constructs state abstractions, i.e. visual representations containing task-relevant features, from language as a way to perform more generalizable learning. However, these abstractions also depend on a user's preference for what matters in a task, which may be hard to describe or infeasible to exhaustively specify using language alone. How do we construct abstractions to capture these latent preferences? We observe that how humans behave reveals how they see the world. Our key insight is that changes in human behavior inform us that there are differences in preferences for how humans see the world, i.e. their state abstractions. In this work, we propose using language models (LMs) to query for those preferences directly given knowledge that a change in behavior has occurred. In our framework, we use the LM in two ways: first, given a text description of the task and knowledge of behavioral change between states, we query the LM for possible hidden preferences; second, given the most likely preference, we query the LM to construct the state abstraction. In this framework, the LM is also able to ask the human directly when uncertain about its own estimate. We demonstrate our framework's ability to construct effective preference-conditioned abstractions in simulated experiments, a user study, as well as on a real Spot robot performing mobile manipulation tasks.

Preference-Conditioned Language-Guided Abstraction

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 6 figures, 1 algorithm)

This paper contains 21 sections, 3 equations, 6 figures, 1 algorithm.

Introduction
Problem Formulation
Preliminaries
Problem Statement
Method: Preference-conditioned Language-Guided Abstraction
LMs as Models of State Abstraction
LMs as Models of Preference
Querying Preferences with Language
Policy Learning with PLGA
Investigating Passive PLGA as a Prior for General Human Preferences
Investigating Active PLGA for Learning User-Specific Preferences
Experimental Setup
Subjective Results: PLGA Enables More Natural and Easy User Interaction
Objective Results: Active PLGA Successfully Learns from Human Preference Queries
Investigating PLGA on a Spot Robot
...and 6 more sections

Figures (6)

Figure 1: Preference-Conditioned Language-Guided Abstraction (PLGA). (Left) The robot uses the demonstration pair to identify a behavior change not captured by the language specification. Given this information, we query the LM for potential preferences that could explain this change. Finally, the robot uses its best preference estimate to query the LM for state abstractions and train a policy. (Right) At test time, the robot generalizes to new states and language specifications using its preference-conditioned abstractions.
Figure 2: We evaluate on three tabletop manipulation tasks: pick, place, and sweep.
Figure 3: Policy success rate (with standard error) on simulated experiments. PLGA outperforms both LGA and GCBC on task performance, showing better preference-conditioned abstraction construction on downstream task learning.
Figure 4: Entropy values show PLGA can model its own uncertainty under preference ambiguity.
Figure 5: User study interaction results (lower is better for all but perceived performance). The interaction experience with Active PLGA is rated more favorably by users than with Active LGA.
...and 1 more figures

Preference-Conditioned Language-Guided Abstraction

TL;DR

Abstract

Preference-Conditioned Language-Guided Abstraction

Authors

TL;DR

Abstract

Table of Contents

Figures (6)