Table of Contents
Fetching ...

Adaptive Language-Guided Abstraction from Contrastive Explanations

Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu

TL;DR

The paper tackles sample-efficient reward learning for robotics by addressing feature under-specification through Adaptive Language-Guided Abstraction from Contrastive Explanations (ALGAE). By alternating between language-model–driven identification of missing, human-meaningful features and maximum-entropy IRL for reward weighting, ALGAE builds richer, more generalizable reward representations from few demonstrations. Empirical results across simulated navigation tasks, high-dimensional manipulation, and real-world Spot experiments show that ALGAE outperforms state baselines and can even recover missing features autonomously, reducing over-parameterization. This approach leverages semantic priors in language to align reward structures with human intent, enabling faster, more robust imitation-like learning in complex, cluttered environments.

Abstract

Many approaches to robot learning begin by inferring a reward function from a set of human demonstrations. To learn a good reward, it is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. End-to-end methods for joint feature and reward learning (e.g., using deep networks or program synthesis techniques) often yield brittle reward functions that are sensitive to spurious state features. By contrast, humans can often generalizably learn from a small number of demonstrations by incorporating strong priors about what features of a demonstration are likely meaningful for a task of interest. How do we build robots that leverage this kind of background knowledge when learning from new demonstrations? This paper describes a method named ALGAE (Adaptive Language-Guided Abstraction from [Contrastive] Explanations) which alternates between using language models to iteratively identify human-meaningful features needed to explain demonstrated behavior, then standard inverse reinforcement learning techniques to assign weights to these features. Experiments across a variety of both simulated and real-world robot environments show that ALGAE learns generalizable reward functions defined on interpretable features using only small numbers of demonstrations. Importantly, ALGAE can recognize when features are missing, then extract and define those features without any human input -- making it possible to quickly and efficiently acquire rich representations of user behavior.

Adaptive Language-Guided Abstraction from Contrastive Explanations

TL;DR

The paper tackles sample-efficient reward learning for robotics by addressing feature under-specification through Adaptive Language-Guided Abstraction from Contrastive Explanations (ALGAE). By alternating between language-model–driven identification of missing, human-meaningful features and maximum-entropy IRL for reward weighting, ALGAE builds richer, more generalizable reward representations from few demonstrations. Empirical results across simulated navigation tasks, high-dimensional manipulation, and real-world Spot experiments show that ALGAE outperforms state baselines and can even recover missing features autonomously, reducing over-parameterization. This approach leverages semantic priors in language to align reward structures with human intent, enabling faster, more robust imitation-like learning in complex, cluttered environments.

Abstract

Many approaches to robot learning begin by inferring a reward function from a set of human demonstrations. To learn a good reward, it is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. End-to-end methods for joint feature and reward learning (e.g., using deep networks or program synthesis techniques) often yield brittle reward functions that are sensitive to spurious state features. By contrast, humans can often generalizably learn from a small number of demonstrations by incorporating strong priors about what features of a demonstration are likely meaningful for a task of interest. How do we build robots that leverage this kind of background knowledge when learning from new demonstrations? This paper describes a method named ALGAE (Adaptive Language-Guided Abstraction from [Contrastive] Explanations) which alternates between using language models to iteratively identify human-meaningful features needed to explain demonstrated behavior, then standard inverse reinforcement learning techniques to assign weights to these features. Experiments across a variety of both simulated and real-world robot environments show that ALGAE learns generalizable reward functions defined on interpretable features using only small numbers of demonstrations. Importantly, ALGAE can recognize when features are missing, then extract and define those features without any human input -- making it possible to quickly and efficiently acquire rich representations of user behavior.
Paper Structure (12 sections, 6 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 6 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Adaptive Language-Guided Abstraction from Contrastive Explanations (ALGAE) alternates between two main stages: in feature specification, ALGAE expands the current feature set by identifying under-specified features of the current reward; then in reward validation, ALGAE learns an updated reward function defined on top of the new feature set and validates it can explain the user demonstrations. ALGAE results in more generalizable learned reward functions vs. baselines without manual feature specification, and can iteratively improve its own reward estimate given multiple under-specified features.
  • Figure 2: We evaluate on both simulated and real-world domains with a variety of missing features. A: 2D maze navigation, where the robot must navigate to a goal while interacting with other objects. B: 7DoF JACO manipulation, where the JACO arm must manipulate a held coffee mug while respecting features like end effector orientation. C: Spot mobile manipulation tasks, where the robot must complete tasks like WaterPlant while respecting the height of pot.
  • Figure 3: A: Normalized reward across multiple iterations in simulated domains. ALGAE (pink) improves across each iteration, continuously finding under-specified features and updating its reward estimate. In contrast, prompting-only baselines such as LM-Feature (gray) and LM-Reward (black) do not iteratively improve themselves after instantiation. B: Normalized reward on Spot domains. Real-world clutter introduced in the scene leads to over-parameterized LM-only rewards, increasing the gap between ALGAE and baselines. Error bars depict standard error across three seeds.