Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

Jonas Karge

Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

Jonas Karge

TL;DR

A probabilistic framework where agents engage in acalibration phase, updating beliefs about their own fixed competence, before facing a final confidence gate that determines whether to vote or abstain is proposed, and how this framework can mitigatehallucinations in collective LLM decision-making is discussed.

Abstract

We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical epistemic voting results, such as the \textit{Condorcet Jury Theorem} (CJT), assume fixed participation, real-world aggregation often benefits from allowing agents to say ``I don't know.'' We propose a probabilistic framework where agents engage in a \textit{calibration} phase, updating beliefs about their own fixed competence, before facing a final confidence gate that determines whether to vote or abstain. We derive a non-asymptotic lower bound on the group's success probability and prove that this \textit{selective participation} generalizes the asymptotic guarantees of the CJT to a sequential, confidence-gated setting. Empirically, we validate these bounds via Monte Carlo simulations. While our results are general, we discuss their potential application to AI safety, outlining how this framework can mitigate \textit{hallucinations} in collective LLM decision-making.

Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

TL;DR

Abstract

Paper Structure (29 sections, 7 theorems, 57 equations, 3 figures, 3 algorithms)

This paper contains 29 sections, 7 theorems, 57 equations, 3 figures, 3 algorithms.

Introduction
The Condorcet Jury Theorem.
Our Voting Framework: Epistemic Filtering.
Related Work
Generalizations of the Condorcet Jury Theorem.
Delegation and Liquid Democracy.
Contributions.
Motivational Scenario
Preliminaries
The Formal Voting Framework.
Confidence Measure and Updating.
Fundamentals of the Beta Distribution.
An Abstention Threshold.
The Regularized Incomplete Beta Function.
Framework Summary.
...and 14 more sections

Key Result

Proposition 1

$(D_k)_{k=1, \dots, NT}$ forms a martingale difference sequence with respect to the filtration $\mathcal{H}_k$.

Figures (3)

Figure 1: Learning and Final Vote Visualization.
Figure 2: Beta Belief Visualization.
Figure 3: Empirical Simulation and Equation \ref{['finalbound']} visualization under two configurations.

Theorems & Definitions (19)

Example 1
Example 2
Definition 1: Private Signal Independence
Example 3
Definition 2
Definition 3: Filtration
Definition 4: Adapted Stochastic Process
Definition 5: Martingale
Definition 6
Definition 7: Doob Martingale Doob2001
...and 9 more

Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

TL;DR

Abstract

Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (19)