Table of Contents
Fetching ...

EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory

Edward Y. Chang

TL;DR

EVINCE introduces a principled, information-theoretic framework for optimizing multi-LLM dialogues by modulating linguistic behavior through conditional statistics and dual entropy. It couples Inclusive Exploration, Information Flow Dynamics, and Reasoning Quality with a CRIT-based evaluative layer to balance exploration and convergence, guided by the Entropy Duality Theorem. The framework is instantiated via a structured two-LLM debate that uses metric-driven termination and a final weighted aggregation, with RAG as a fallback for high-uncertainty cases. Empirical validation in disease diagnosis and news debiasing demonstrates improved predictive accuracy, enhanced reasoning robustness, and practical bias mitigation, underscoring EVINCE's potential for reliable, open-domain collaborative AI in critical applications.

Abstract

EVINCE (Entropy and Variation IN Conditional Exchanges) is a novel framework for optimizing multi-LLM dialogues using conditional statistics and information theory. It addresses limitations in multi-agent debate (MAS) frameworks, where multiple LLMs ``chat'' without behavior modulation or mutual information quality assessment. Using dual entropy optimization to balance perspective diversity and prior knowledge, $\EVINCE$ provides quantitative tools to dynamically regulate LLM linguistic behaviors. When mutual information is low and both cross-entropy and Wasserstein distance are high, EVINCE promotes contentious dialogues to expose diverse perspectives and uncover inconsistencies. Conversely, as cross-entropy decreases and mutual information stabilizes, it transitions discussions into a conciliatory phase, encouraging compromise and acknowledgment of valid points. Using information-theoretic metrics and optimizing mutual information, $\EVINCE$ emerges as a structured and highly effective framework for multi-LLM collaboration.

EVINCE: Optimizing Multi-LLM Dialogues Using Conditional Statistics and Information Theory

TL;DR

EVINCE introduces a principled, information-theoretic framework for optimizing multi-LLM dialogues by modulating linguistic behavior through conditional statistics and dual entropy. It couples Inclusive Exploration, Information Flow Dynamics, and Reasoning Quality with a CRIT-based evaluative layer to balance exploration and convergence, guided by the Entropy Duality Theorem. The framework is instantiated via a structured two-LLM debate that uses metric-driven termination and a final weighted aggregation, with RAG as a fallback for high-uncertainty cases. Empirical validation in disease diagnosis and news debiasing demonstrates improved predictive accuracy, enhanced reasoning robustness, and practical bias mitigation, underscoring EVINCE's potential for reliable, open-domain collaborative AI in critical applications.

Abstract

EVINCE (Entropy and Variation IN Conditional Exchanges) is a novel framework for optimizing multi-LLM dialogues using conditional statistics and information theory. It addresses limitations in multi-agent debate (MAS) frameworks, where multiple LLMs ``chat'' without behavior modulation or mutual information quality assessment. Using dual entropy optimization to balance perspective diversity and prior knowledge, provides quantitative tools to dynamically regulate LLM linguistic behaviors. When mutual information is low and both cross-entropy and Wasserstein distance are high, EVINCE promotes contentious dialogues to expose diverse perspectives and uncover inconsistencies. Conversely, as cross-entropy decreases and mutual information stabilizes, it transitions discussions into a conciliatory phase, encouraging compromise and acknowledgment of valid points. Using information-theoretic metrics and optimizing mutual information, emerges as a structured and highly effective framework for multi-LLM collaboration.
Paper Structure (44 sections, 16 equations, 9 figures, 10 tables)

This paper contains 44 sections, 16 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Specifications of Algorithm $\mathsf{EVINCE}$. Key points: 1) Asymmetric Start: In Step #1, LLM$_A$ initiates with opening arguments based solely on the given information, while LLM$_B$ starts with access to LLM$_A$'s prediction and arguments for refutation. 2) Termination Criteria: The while loop in Step #2 considers three factors: Wasserstein distance, mutual information, and argument quality. $\mathsf{EVINCE}$ terminates if the dialogue ceases to make significant progress. 3) Further Details: Maxims #1 to #4 provide additional explanations of the algorithm's principles. 4) Argument Evaluation: Step #2.2 evaluates of argument quality, and the while loop examines if argument quality continues to improve. 5) Update($\Delta$) modulates contentiousness, and see Maxim #3 for its specifications.
  • Figure 2: EVINCE improves diagnosis accuracy markedly.
  • Figure 3: Confusion matrices
  • Figure 4: Entropy, WD, and normalized MI
  • Figure 5: Convergence of all information metrics.
  • ...and 4 more figures

Theorems & Definitions (1)

  • proof