LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback
Tanushree Banerjee, Richard Zhu, Runzhe Yang, Karthik Narasimhan
TL;DR
The paper tackles lie detection in nuanced dialogue by introducing a three-stage bootstrapping framework where a base LLM generates initial predictions, an LLM (and humans) provides feedback, and a stronger LM refines outputs using that feedback. Across Diplomacy conversations, LLM-generated feedback significantly improves zero-shot performance and can rival supervised LSTM baselines, with GPT-4 feedback outperforming human feedback in lying recall. The approach achieves a 39% gain in lying-F1 without extra training data and demonstrates cost-effective effectiveness, suggesting a scalable path for enhancing LLM reasoning on underspecified tasks. Limitations include the open-access constraint of GPT-4 and a small set of human annotators, but results highlight the potential of self- or model-generated feedback to bootstrap advanced reasoning in deception detection.
Abstract
Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. However, understanding the subtleties of complex exchanges in language remains a challenge. We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection. The framework consists of three stages: suggestion, feedback collection, and modification. In the suggestion stage, a cost-effective language model generates initial predictions based on game state and dialogue. The feedback-collection stage involves a language model providing feedback on these predictions. In the modification stage, a more advanced language model refines the initial predictions using the auto-generated feedback. We investigate the application of the proposed framework for detecting betrayal and deception in Diplomacy games, and compare it with feedback from professional human players. The LLM-generated feedback exhibits superior quality and significantly enhances the performance of the model. Our approach achieves a 39% improvement over the zero-shot baseline in lying-F1 without the need for any training data, rivaling state-of-the-art supervised learning results.
