SteerConf: Steering LLMs for Confidence Elicitation
Ziang Zhou, Tianyuan Jin, Jieming Shi, Qing Li
TL;DR
SteerConf addresses overconfidence in LLMs by steering verbalized confidence through a prompt-based framework that collects responses across multiple steering levels, quantifies confidence consistency, and calibrates final confidence without model training. The method integrates steering prompts, steered confidence consistency, and calibrated selection to improve both confidence calibration and failure prediction, demonstrated across seven benchmarks and three state-of-the-art models. Results show substantial gains in calibration (lower ECE) and failure detection (higher AUROC and PR metrics) compared with vanilla prompts and baselines, including notable improvements under Chain-of-Thought prompting. This work offers a practical, black-box strategy to enhance the reliability of LLMs in high-stakes settings, supporting safer real-world deployments while avoiding internal model access or fine-tuning.
Abstract
Large Language Models (LLMs) exhibit impressive performance across diverse domains but often suffer from overconfidence, limiting their reliability in critical applications. We propose SteerConf, a novel framework that systematically steers LLMs' confidence scores to improve their calibration and reliability. SteerConf introduces three key components: (1) a steering prompt strategy that guides LLMs to produce confidence scores in specified directions (e.g., conservative or optimistic) by leveraging prompts with varying steering levels; (2) a steered confidence consistency measure that quantifies alignment across multiple steered confidences to enhance calibration; and (3) a steered confidence calibration method that aggregates confidence scores using consistency measures and applies linear quantization for answer selection. SteerConf operates without additional training or fine-tuning, making it broadly applicable to existing LLMs. Experiments on seven benchmarks spanning professional knowledge, common sense, ethics, and reasoning tasks, using advanced LLM models (GPT-3.5, LLaMA 3, GPT-4), demonstrate that SteerConf significantly outperforms existing methods, often by a significant margin. Our findings highlight the potential of steering the confidence of LLMs to enhance their reliability for safer deployment in real-world applications.
