Table of Contents
Fetching ...

APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs

Huaxiaoyue Wang, Nathaniel Chin, Gonzalo Gonzalez-Pumariega, Xiangwan Sun, Neha Sunkara, Maximus Adrian Pace, Jeannette Bohg, Sanjiban Choudhury

TL;DR

A novel approach that merges LLM-based Bayesian active preference learning with constraint-aware task planning, APRICOT, which refines its generated preferences by actively querying the user and dynamically adapts its plan to respect environmental constraints.

Abstract

Home robots performing personalized tasks must adeptly balance user preferences with environmental affordances. We focus on organization tasks within constrained spaces, such as arranging items into a refrigerator, where preferences for placement collide with physical limitations. The robot must infer user preferences based on a small set of demonstrations, which is easier for users to provide than extensively defining all their requirements. While recent works use Large Language Models (LLMs) to learn preferences from user demonstrations, they encounter two fundamental challenges. First, there is inherent ambiguity in interpreting user actions, as multiple preferences can often explain a single observed behavior. Second, not all user preferences are practically feasible due to geometric constraints in the environment. To address these challenges, we introduce APRICOT, a novel approach that merges LLM-based Bayesian active preference learning with constraint-aware task planning. APRICOT refines its generated preferences by actively querying the user and dynamically adapts its plan to respect environmental constraints. We evaluate APRICOT on a dataset of diverse organization tasks and demonstrate its effectiveness in real-world scenarios, showing significant improvements in both preference satisfaction and plan feasibility. The project website is at https://portal-cornell.github.io/apricot/

APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs

TL;DR

A novel approach that merges LLM-based Bayesian active preference learning with constraint-aware task planning, APRICOT, which refines its generated preferences by actively querying the user and dynamically adapts its plan to respect environmental constraints.

Abstract

Home robots performing personalized tasks must adeptly balance user preferences with environmental affordances. We focus on organization tasks within constrained spaces, such as arranging items into a refrigerator, where preferences for placement collide with physical limitations. The robot must infer user preferences based on a small set of demonstrations, which is easier for users to provide than extensively defining all their requirements. While recent works use Large Language Models (LLMs) to learn preferences from user demonstrations, they encounter two fundamental challenges. First, there is inherent ambiguity in interpreting user actions, as multiple preferences can often explain a single observed behavior. Second, not all user preferences are practically feasible due to geometric constraints in the environment. To address these challenges, we introduce APRICOT, a novel approach that merges LLM-based Bayesian active preference learning with constraint-aware task planning. APRICOT refines its generated preferences by actively querying the user and dynamically adapts its plan to respect environmental constraints. We evaluate APRICOT on a dataset of diverse organization tasks and demonstrate its effectiveness in real-world scenarios, showing significant improvements in both preference satisfaction and plan feasibility. The project website is at https://portal-cornell.github.io/apricot/

Paper Structure

This paper contains 45 sections, 1 theorem, 11 equations, 11 figures, 3 tables, 1 algorithm.

Key Result

Theorem 7.1

Under assumptions (eq:realization_cost) and (eq:suboptimal_plan_cost), given a problem with the ground-truths $\theta^*$ and $\xi^*$, APRICOT outputs $\hat{\xi}$ when it follows (eq:terminate_fn) the terminating condition $f(P(\theta)) = \exists {\xi \in \Xi} \quad \mathrm{s.t} \quad \sum_{i=1}^N P(

Figures (11)

  • Figure 1: Overview of APRICOT that (1) converts user visual demonstrations into language-based demonstrations, (2) given demonstrations, determines the preference that best approximates the ground-truth user preference by minimally querying the user, (3) generates and refines a plan based on world models' feedback to satisfy preferences and respect constraints, (4) executes the plan in a real robot system.
  • Figure 2: LLM-Based Bayesian Active Preference Learning Approach. Given a set of language-based demonstrations, APRICOT (1) proposes candidate preferences and corresponding candidate plans, (2) determines whether to terminate by evaluating whether the prior over candidate preferences $P(\theta)$ is sufficient, (3) select the optimal question that maximizes information gain before updating its prior based on user answers.
  • Figure 3: Active Preference Learning Results on Benchmark Dataset.APRICOT achieves the highest preference accuracy $58\%$, which is the percentage of outputted preferences that are equivalent to the ground-truth preference, while asking the user the smallest amount of questions ($2.15$ on average).
  • Figure 4: Example Queries From Each Approach.APRICOT correctly infers the ground-truth user preference with the least number of queries because it selects informative questions directly about the category with the most complex requirement. In contrast, LLM-Q/A exhausts the number of queries, while Cand+LLM-Q/A terminates early but infers the preference incorrectly. Preferences here are simplified as bullet points for readability.
  • Figure 5: Task Planner Results on Real-Robot Scenarios. Evaluated on 9 scenarios with 3 difficulty levels. The qualitative example below APRICOT generating a plan that satisfies preferences and respect constraints
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem 7.1
  • proof