Table of Contents
Fetching ...

LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback

Tanushree Banerjee, Richard Zhu, Runzhe Yang, Karthik Narasimhan

TL;DR

The paper tackles lie detection in nuanced dialogue by introducing a three-stage bootstrapping framework where a base LLM generates initial predictions, an LLM (and humans) provides feedback, and a stronger LM refines outputs using that feedback. Across Diplomacy conversations, LLM-generated feedback significantly improves zero-shot performance and can rival supervised LSTM baselines, with GPT-4 feedback outperforming human feedback in lying recall. The approach achieves a 39% gain in lying-F1 without extra training data and demonstrates cost-effective effectiveness, suggesting a scalable path for enhancing LLM reasoning on underspecified tasks. Limitations include the open-access constraint of GPT-4 and a small set of human annotators, but results highlight the potential of self- or model-generated feedback to bootstrap advanced reasoning in deception detection.

Abstract

Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. However, understanding the subtleties of complex exchanges in language remains a challenge. We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection. The framework consists of three stages: suggestion, feedback collection, and modification. In the suggestion stage, a cost-effective language model generates initial predictions based on game state and dialogue. The feedback-collection stage involves a language model providing feedback on these predictions. In the modification stage, a more advanced language model refines the initial predictions using the auto-generated feedback. We investigate the application of the proposed framework for detecting betrayal and deception in Diplomacy games, and compare it with feedback from professional human players. The LLM-generated feedback exhibits superior quality and significantly enhances the performance of the model. Our approach achieves a 39% improvement over the zero-shot baseline in lying-F1 without the need for any training data, rivaling state-of-the-art supervised learning results.

LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback

TL;DR

The paper tackles lie detection in nuanced dialogue by introducing a three-stage bootstrapping framework where a base LLM generates initial predictions, an LLM (and humans) provides feedback, and a stronger LM refines outputs using that feedback. Across Diplomacy conversations, LLM-generated feedback significantly improves zero-shot performance and can rival supervised LSTM baselines, with GPT-4 feedback outperforming human feedback in lying recall. The approach achieves a 39% gain in lying-F1 without extra training data and demonstrates cost-effective effectiveness, suggesting a scalable path for enhancing LLM reasoning on underspecified tasks. Limitations include the open-access constraint of GPT-4 and a small set of human annotators, but results highlight the potential of self- or model-generated feedback to bootstrap advanced reasoning in deception detection.

Abstract

Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. However, understanding the subtleties of complex exchanges in language remains a challenge. We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection. The framework consists of three stages: suggestion, feedback collection, and modification. In the suggestion stage, a cost-effective language model generates initial predictions based on game state and dialogue. The feedback-collection stage involves a language model providing feedback on these predictions. In the modification stage, a more advanced language model refines the initial predictions using the auto-generated feedback. We investigate the application of the proposed framework for detecting betrayal and deception in Diplomacy games, and compare it with feedback from professional human players. The LLM-generated feedback exhibits superior quality and significantly enhances the performance of the model. Our approach achieves a 39% improvement over the zero-shot baseline in lying-F1 without the need for any training data, rivaling state-of-the-art supervised learning results.
Paper Structure (61 sections, 18 figures)

This paper contains 61 sections, 18 figures.

Figures (18)

  • Figure 1: LLM-based framework for lie detection in the game of diplomacy. The framework comprises three stages: 1) suggestion, 2) feedback collection, and 3) modification. In the suggestion stage, a language model generates predictions and rationales using the textual representation of the board information and messages. During the feedback collection stage, the language model provides feedback on the previous predictions. A comparison is made with human-written feedback collected during this stage. Finally, in the modification stage, the language model refines the initial predictions based on the received feedback.
  • Figure 2: Feedback Lengths. Feedback obtained from 3 human players and LLMs (GPT-3.5 and GPT-4) for the suggestion stage outputs across 102 conversations. Notches represent the median, box boundaries indicate the 25th and 75th percentiles, and circles denote outliers.
  • Figure 3: Main results. LLM-feedback notably improved macro and lying-f1 scores over GPT-4 zero-shot predictions, outperforming even human feedback (H, red dashed line). Performance was on par with the best supervised learning baseline (SL, blue dashed line). Among human feedback providers, Human1 proved most effective. GPT-3 zero-shot performance in the suggestion stage is shown by the green line (G). Numbers are mean F1 scores. Error bars represent the 95% confidence interval from 5 runs.
  • Figure 4: Feedback consistency. Average pairwise feedback consistency scores measured by GPT-4.
  • Figure 5: Lying-F1 by human feedback consistency.Human1 consistently provided longer feedback compared to other human feedback providers. We quantified the pairwise consistency of feedback using GPT-4. Notably, Human1 substantially improved the feedback quality in cases where human feedback was contradictory. Horizontal bars indicate medians, and the shapes of violins represent the distributions smoothed by kernel density estimation.
  • ...and 13 more figures