BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
Yu Feng, Ben Zhou, Weidong Lin, Dan Roth
TL;DR
BIRD tackles unreliable probability estimation in large language models under partial information by coupling abductive factor generation with a Bayesian network, then refining conditional probabilities through constrained optimization and LLM entailment to compute reliable P(O|C). The framework produces interpretable, language-based Bayesian variables and demonstrates significant improvements in probability calibration and decision-making across diverse reasoning tasks. Extrinsic work shows Bird-derived probabilities can serve as supervision signals to improve smaller models, while follow-up-question generation highlights its utility for trust-aware, interactive decision support. Overall, Bird advances trustworthy AI by providing a transparent, data-efficient approach to probabilistic inference in LLM-driven applications.
Abstract
Predictive models often need to work with incomplete information in real-world tasks. Consequently, they must provide reliable probability or confidence estimation, especially in large-scale decision-making and planning tasks. Current large language models (LLMs) are insufficient for accurate estimations, but they can generate relevant factors that may affect the probabilities, produce coarse-grained probabilities when the information is more complete, and help determine which factors are relevant to specific downstream contexts. In this paper, we make use of these capabilities of LLMs to provide a significantly more accurate probabilistic estimation. We propose BIRD, a novel probabilistic inference framework that aligns a Bayesian network with LLM abductions and then estimates more accurate probabilities in a deduction step. We show BIRD provides reliable probability estimations that are 30% better than those provided directly by LLM baselines. These estimates further contribute to better and more trustworthy decision making.
