PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories

Stephane Aroca-Ouellette; Natalie Mackraz; Barry-John Theobald; Katherine Metcalf

PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories

Stephane Aroca-Ouellette, Natalie Mackraz, Barry-John Theobald, Katherine Metcalf

TL;DR

This paper introduces PREDICT, a method designed to enhance the precision and adaptability of inferring preferences and evaluates PREDICT on two distinct environments: a gridworld setting and a new text-domain environment (PLUME).

Abstract

Accommodating human preferences is essential for creating AI agents that deliver personalized and effective interactions. Recent work has shown the potential for LLMs to infer preferences from user interactions, but they often produce broad and generic preferences, failing to capture the unique and individualized nature of human preferences. This paper introduces PREDICT, a method designed to enhance the precision and adaptability of inferring preferences. PREDICT incorporates three key elements: (1) iterative refinement of inferred preferences, (2) decomposition of preferences into constituent components, and (3) validation of preferences across multiple trajectories. We evaluate PREDICT on two distinct environments: a gridworld setting and a new text-domain environment (PLUME). PREDICT more accurately infers nuanced human preferences improving over existing baselines by 66.2\% (gridworld environment) and 41.0\% (PLUME).

PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories

TL;DR

Abstract

Paper Structure (27 sections, 14 figures, 5 tables, 2 algorithms)

This paper contains 27 sections, 14 figures, 5 tables, 2 algorithms.

Introduction
Related Work
PREDICT
Experimental Set Up
Research Questions
Environment 1: PICK UP
Environment 2: Assistive Writing
Baselines
Results and Discussion
Limitations and Future Work
Conclusion
Acknowledgements
Algorithm
PICK UP Objects Visualization
Extended Results
...and 12 more sections

Figures (14)

Figure 1: The three PREDICT components: iterative refinement, breakdown, and validate.
Figure 2: Mean and standard deviation (5 seeds) performance for CIPHER-1, in-context learning (ICL), PREDICT, Oracle, and no preferences (NPC) for different preference-inferring LLMs.
Figure 3: PPCM mean and standard deviation (5 seeds) for PREDICT, CIPHER-1, and in-context learning (ICL) by Email (top) and Summary (bottom) sub-task type. GPT-4o is the LLM.
Figure 4: A depiction of a user example and associated language descriptions for the PICK UP task.
Figure 5: Performance for PREDICT, behavior cloning (BC), CIPHER-1, and in-context learning (ICL) given different numbers of user samples to learn from. Mean and standard deviation (5 seeds) for preference similarity (IoU in PICK UP and BScore in PLUME) and preference-conditioned generation quality (Avg. Return for PICK UP and PPCM for PLUME). GPT-4o is the LLM used.
...and 9 more figures

PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories

TL;DR

Abstract

PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories

Authors

TL;DR

Abstract

Table of Contents

Figures (14)