Robust Knowledge Extraction from Large Language Models using Social Choice Theory

Nico Potyka; Yuqicheng Zhu; Yunjie He; Evgeny Kharlamov; Steffen Staab

Robust Knowledge Extraction from Large Language Models using Social Choice Theory

Nico Potyka, Yuqicheng Zhu, Yunjie He, Evgeny Kharlamov, Steffen Staab

TL;DR

This paper addresses the robustness shortcomings of large language models (LLMs) for high-stakes query answering by proposing repeated ranking queries and aggregating their results with social choice theory, specifically Partial Borda Weighting (PBW). It formalizes a transformation framework, using a transformation T(Q,N,t) to generate multiple ranking profiles from an input query and applying A_{PBW} to obtain a single robust ranking, with $w^{PBW}_{\succeq}(o) = 2 \cdot \mathrm{Down}_{\succeq}(o) + \mathrm{Inc}_{\succeq}(o)$ and $s^{PBW}_{p}(o) = \sum_{i=1}^N w^{PBW}_{\succeq_i}(o)$, followed by $\overline{s}^{PBW}(o) = s^{PBW}(o) / \sum_{o'} s^{PBW}(o')$ and $f^{PBW}(p) = \arg \max_{o} s^{PBW}_{p}(o)$. The approach is evaluated on manufacturing, finance, and medical ranking tasks, showing that PBW-based aggregation improves robustness to both query and syntax uncertainty relative to baselines that do not aggregate or use simple averaging. The results demonstrate that even small numbers of aggregated responses can yield substantial improvements in rank stability, suggesting practical utility for domain-specific, high-accuracy LLM applications. The work highlights a principled, interpretable uncertainty-quantification pathway for LLMs that leverages established social-choice mechanisms without requiring fine-tuning or access to proprietary model internals.

Abstract

Large-language models (LLMs) can support a wide range of applications like conversational agents, creative writing or general query answering. However, they are ill-suited for query answering in high-stake domains like medicine because they are typically not robust - even the same query can result in different answers when prompted multiple times. In order to improve the robustness of LLM queries, we propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory. We study ranking queries in diagnostic settings like medical and fault diagnosis and discuss how the Partial Borda Choice function from the literature can be applied to merge multiple query results. We discuss some additional interesting properties in our setting and evaluate the robustness of our approach empirically.

Robust Knowledge Extraction from Large Language Models using Social Choice Theory

TL;DR

and

, followed by

and

. The approach is evaluated on manufacturing, finance, and medical ranking tasks, showing that PBW-based aggregation improves robustness to both query and syntax uncertainty relative to baselines that do not aggregate or use simple averaging. The results demonstrate that even small numbers of aggregated responses can yield substantial improvements in rank stability, suggesting practical utility for domain-specific, high-accuracy LLM applications. The work highlights a principled, interpretable uncertainty-quantification pathway for LLMs that leverages established social-choice mechanisms without requiring fine-tuning or access to proprietary model internals.

Abstract

Paper Structure (26 sections, 3 theorems, 13 equations, 9 figures, 6 tables, 2 algorithms)

This paper contains 26 sections, 3 theorems, 13 equations, 9 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Social Choice Theory Background
Improving the Robustness of LLM Queries with PBW
From Queries to Rankings
Answer Aggregation
Properties
Experiments
Generation of Ranking Queries
Generate Symptom-Cause Matrices
Sample Symptom Sets
From Symptom Sets to Ranking Queries
Evaluation Protocol
Baselines
Evaluation Metrics
...and 11 more sections

Key Result

theorem 1

Figures (9)

Figure 1: Query templates for evaluating query uncertainty
Figure 2: Syntactic variants of the manufacturing query.
Figure 3: Robustness with respect to the number of answers used for aggregation.
Figure :
Figure :
...and 4 more figures

Theorems & Definitions (5)

definition 1: PBW Weighting
theorem 1: cullinan2014borda
theorem 2: cullinan2014borda
definition 2
proposition 1

Robust Knowledge Extraction from Large Language Models using Social Choice Theory

TL;DR

Abstract

Robust Knowledge Extraction from Large Language Models using Social Choice Theory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (5)