Table of Contents
Fetching ...

Improving Context-Aware Preference Modeling for Language Models

Silviu Pitis, Ziang Xiao, Nicolas Le Roux, Alessandro Sordoni

TL;DR

The two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context is considered, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences.

Abstract

While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B on tested datasets, and (3) investigate the value of context-aware preference modeling.

Improving Context-Aware Preference Modeling for Language Models

TL;DR

The two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context is considered, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences.

Abstract

While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B on tested datasets, and (3) investigate the value of context-aware preference modeling.
Paper Structure (33 sections, 4 equations, 3 figures, 8 tables)

This paper contains 33 sections, 4 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Context-Aware Preference Modeling.Left: The standard approach uses a preference model (PM) to directly evaluate arbitrary and potentially ambiguous preference queries. Right: The context-aware preference modeling (CAPM) approach recognizes preference may depend on some unspecified context and makes this explicit: first identify the context, then evaluate a context-specific preference. In both cases, rather than computing preference directly, one may use a (context-aware) reward model (RM or CARM) to evaluate each alternative independently.
  • Figure 2: Effect of Context on Preference Modeling Performance. Added context improves agreement with gold labels as compared to a no context (NC) baseline. Our 7B parameter, finetuned Context-Aware Reward Model (Mistral CARM) achieves the best context-aware performance, outperforming the larger Llama3-70B model (and GPT-4 Turbo), both on datasets where context is necessary to predict preference (RPR and Multifaceted Bench), and on the context-augmented HHH, Reward Bench and Chatbot Arena datasets. Details and additional results may be found in \ref{['section_empirical']}.
  • Figure 3: A sample from the RPR dataset. Under Criteria A or Scenario A, Completion A should be preferred, and vice versa under Criteria B or Scenario B.