Table of Contents
Fetching ...

Evaluating the role of `Constitutions' for learning from AI feedback

Saskia Redgate, Andrew M. Bean, Adam Mahdi

Abstract

The growing capabilities of large language models (LLMs) have led to their use as substitutes for human feedback for training and assessing other LLMs. These methods often rely on `constitutions', written guidelines which a critic model uses to provide feedback and improve generations. We investigate how the choice of constitution affects feedback quality by using four different constitutions to improve patient-centered communication in medical interviews. In pairwise comparisons conducted by 215 human raters, we found that detailed constitutions led to better results regarding emotive qualities. However, none of the constitutions outperformed the baseline in learning more practically-oriented skills related to information gathering and provision. Our findings indicate that while detailed constitutions should be prioritised, there are possible limitations to the effectiveness of AI feedback as a reward signal in certain areas.

Evaluating the role of `Constitutions' for learning from AI feedback

Abstract

The growing capabilities of large language models (LLMs) have led to their use as substitutes for human feedback for training and assessing other LLMs. These methods often rely on `constitutions', written guidelines which a critic model uses to provide feedback and improve generations. We investigate how the choice of constitution affects feedback quality by using four different constitutions to improve patient-centered communication in medical interviews. In pairwise comparisons conducted by 215 human raters, we found that detailed constitutions led to better results regarding emotive qualities. However, none of the constitutions outperformed the baseline in learning more practically-oriented skills related to information gathering and provision. Our findings indicate that while detailed constitutions should be prioritised, there are possible limitations to the effectiveness of AI feedback as a reward signal in certain areas.

Paper Structure

This paper contains 24 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Dialogue generation with in-context learning. The Patient model is given a vignette which is used to create a dialogue with a Doctor model. A Moderator model observes the conversation and intervenes when it sees a conversational indication that the interaction has ended. The conversation is then given to a Critic model, which provides feedback based on one of the four different constitutions (Sec. \ref{['sec:constitutions']}), and returns the feedback to the Doctor. This process is repeated for each vignette. The final conversations are collected and evaluated by 215 human raters recruited via Prolific (Sec. \ref{['sec:human-eval']}).
  • Figure 2: Preferred Constitutions. In each subplot, we show the percentage of respondents preferring each conversation as a heatmap, alongside the estimated values from a Bradley-Terry model. We set the 'No Constitution' group as a reference point. Error bars represent a 95% confidence interval, not adjusted for multiple comparisons.
  • Figure 3: Side-by-side dialogues. The pairs of dialogues to be compared are presented side by side. The doctor is highlighted in red for visual clarity.
  • Figure 4: Preference ratings. Participants choose which dialogue they preferred for each aspect of the patient-centered communication framework.