Table of Contents
Fetching ...

Aligning LLM Agents by Learning Latent Preference from User Edits

Ge Gao, Alexey Taymanov, Eduardo Salinas, Paul Mineiro, Dipendra Misra

TL;DR

A simple yet effective algorithm named CIPHER is proposed that leverages the LLM to infer the user preference for a given context based on user edits, and reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences.

Abstract

We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data. The inferred user preference descriptions are used to define prompts for generating responses in the future. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks. Furthermore, learning descriptive preference improves interpretability, allowing the user to view and modify the learned preference. However, user preference can be complex, subtle, and vary based on context, making it challenging to learn. To address this, we propose a simple yet effective algorithm named CIPHER that leverages the LLM to infer the user preference for a given context based on user edits. In the future, CIPHER retrieves inferred preferences from the k-closest contexts in the history, and forms an aggregate preference for response generation. We introduce two interactive environments -- summarization and email writing, and use a GPT-4 simulated user for evaluation. On both tasks, CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. Our analysis reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences.

Aligning LLM Agents by Learning Latent Preference from User Edits

TL;DR

A simple yet effective algorithm named CIPHER is proposed that leverages the LLM to infer the user preference for a given context based on user edits, and reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences.

Abstract

We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data. The inferred user preference descriptions are used to define prompts for generating responses in the future. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks. Furthermore, learning descriptive preference improves interpretability, allowing the user to view and modify the learned preference. However, user preference can be complex, subtle, and vary based on context, making it challenging to learn. To address this, we propose a simple yet effective algorithm named CIPHER that leverages the LLM to infer the user preference for a given context based on user edits. In the future, CIPHER retrieves inferred preferences from the k-closest contexts in the history, and forms an aggregate preference for response generation. We introduce two interactive environments -- summarization and email writing, and use a GPT-4 simulated user for evaluation. On both tasks, CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. Our analysis reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences.
Paper Structure (28 sections, 4 figures, 12 tables, 3 algorithms)

This paper contains 28 sections, 4 figures, 12 tables, 3 algorithms.

Figures (4)

  • Figure 1: Illustration of interactive learning from user edits. Color coding in edits is for visualization only -- our agent takes the plain revised text as feedback.
  • Figure 2: Learning curves of different methods based on cumulative cost over time (average across 3 seeds). In the legend, -k means with top $k$ retrieved examples, -B for BERT, and -M for MPNET.
  • Figure 3: Percentage of zero-cost examples of CIPHER over time, binned per 20 rounds to show the trend (average across 3 seeds). In the legend, -k means with top $k$ retrieved examples, -B for BERT, and -M for MPNET.
  • Figure 4: Normalized cost of CIPHER over time, binned per 20 rounds to show the trend (average across 3 seeds). In the legend, -k means with top $k$ retrieved examples, -B for BERT, and -M for MPNET.