Table of Contents
Fetching ...

CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference

Kaijie Xu, Fandi Meng, Clark Verbrugge, Simon Lucas

TL;DR

CSP4SDG addresses hidden-role inference in social deduction games by formulating it as a training-free probabilistic constraint-satisfaction problem. A lightweight LLM converts raw game logs into four constraint types (Evidence, Phenomenon, Assertions, Hypotheses), which are then pruned by hard constraints and scored with information-gain weighted soft constraints to produce calibrated posteriors Pr(r|C_t) and a MAP assignment. The approach unifies classical CSPs with information theory, yielding interpretable, updateable posteriors and enabling a plug-and-play module that can boost or stand in for LLM-based reasoning. Empirical results across Avalon, Mafia, and AvalonLogs show CSP4SDG consistently outperforms pure LLM baselines and enhances LLM reasoning when combined, highlighting principled probabilistic reasoning as a scalable complement to neural models in SDGs. The work demonstrates that structured constraint-based reasoning can achieve high accuracy with interpretability and real-time updating, offering practical value for AI agents and human analysts in deception-rich interactive domains.

Abstract

In Social Deduction Games (SDGs) such as Avalon, Mafia, and Werewolf, players conceal their identities and deliberately mislead others, making hidden-role inference a central and demanding task. Accurate role identification, which forms the basis of an agent's belief state, is therefore the keystone for both human and AI performance. We introduce CSP4SDG, a probabilistic, constraint-satisfaction framework that analyses gameplay objectively. Game events and dialogue are mapped to four linguistically-agnostic constraint classes-evidence, phenomena, assertions, and hypotheses. Hard constraints prune impossible role assignments, while weighted soft constraints score the remainder; information-gain weighting links each hypothesis to its expected value under entropy reduction, and a simple closed-form scoring rule guarantees that truthful assertions converge to classical hard logic with minimum error. The resulting posterior over roles is fully interpretable and updates in real time. Experiments on three public datasets show that CSP4SDG (i) outperforms LLM-based baselines in every inference scenario, and (ii) boosts LLMs when supplied as an auxiliary "reasoning tool." Our study validates that principled probabilistic reasoning with information theory is a scalable alternative-or complement-to heavy-weight neural models for SDGs.

CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference

TL;DR

CSP4SDG addresses hidden-role inference in social deduction games by formulating it as a training-free probabilistic constraint-satisfaction problem. A lightweight LLM converts raw game logs into four constraint types (Evidence, Phenomenon, Assertions, Hypotheses), which are then pruned by hard constraints and scored with information-gain weighted soft constraints to produce calibrated posteriors Pr(r|C_t) and a MAP assignment. The approach unifies classical CSPs with information theory, yielding interpretable, updateable posteriors and enabling a plug-and-play module that can boost or stand in for LLM-based reasoning. Empirical results across Avalon, Mafia, and AvalonLogs show CSP4SDG consistently outperforms pure LLM baselines and enhances LLM reasoning when combined, highlighting principled probabilistic reasoning as a scalable complement to neural models in SDGs. The work demonstrates that structured constraint-based reasoning can achieve high accuracy with interpretability and real-time updating, offering practical value for AI agents and human analysts in deception-rich interactive domains.

Abstract

In Social Deduction Games (SDGs) such as Avalon, Mafia, and Werewolf, players conceal their identities and deliberately mislead others, making hidden-role inference a central and demanding task. Accurate role identification, which forms the basis of an agent's belief state, is therefore the keystone for both human and AI performance. We introduce CSP4SDG, a probabilistic, constraint-satisfaction framework that analyses gameplay objectively. Game events and dialogue are mapped to four linguistically-agnostic constraint classes-evidence, phenomena, assertions, and hypotheses. Hard constraints prune impossible role assignments, while weighted soft constraints score the remainder; information-gain weighting links each hypothesis to its expected value under entropy reduction, and a simple closed-form scoring rule guarantees that truthful assertions converge to classical hard logic with minimum error. The resulting posterior over roles is fully interpretable and updates in real time. Experiments on three public datasets show that CSP4SDG (i) outperforms LLM-based baselines in every inference scenario, and (ii) boosts LLMs when supplied as an auxiliary "reasoning tool." Our study validates that principled probabilistic reasoning with information theory is a scalable alternative-or complement-to heavy-weight neural models for SDGs.

Paper Structure

This paper contains 72 sections, 1 theorem, 14 equations, 3 figures, 9 tables.

Key Result

Theorem 1

Suppose all assertion constraints in $A$ are truthful, and let be the soft‐constraint score, with $w_A>1$, and the score when assertions in $A$ are enforced as hard constraints. Define posterior distributions where $A_{all}\subseteq A_t$ is the subset of assignments satisfying all assertions. Then for every assignment $a$,

Figures (3)

  • Figure 1: Schematic overview of the proposed inference architecture for social–deduction games. Game History (left) comprises objective event traces (e.g., eliminations, assassinations, quest outcomes) and subjective conversation turns. Information Representation (centre) leverages an auxiliary LLM mann2020language to transform raw logs into a structured constraint set: (i) hard constraints (evidence and phenomena) that are logically inviolable, and (ii) soft constraints (player assertions and hypotheses) that contribute graded probabilistic weight. Reasoning (right) contrasts three inference engines. A plain LLM lacks the combinatorial apparatus required for reliable role deduction; a hybrid LLM + CSP benefits from externally supplied posteriors but is still bottlenecked by heuristic language reasoning; our CSP solver enforces all hard constraints and optimally scores soft ones, delivering calibrated posterior distributions and MAP role assignments that achieve the highest empirical accuracy.
  • Figure 2: Quest-by-quest accuracy trends on Avalon-NLU. Quest 6 = assassination round following three successful quests.
  • Figure 3: Accuracy distribution across different perspectives (Truthful condition). Good-role perspectives (blue) achieve higher accuracy, evil-role perspectives (red) exhibit broader variance, and objective views (gray) record lower accuracy levels. Abbreviations: C = CSP, LC = LLM+CSP, L = LLM; A = +Assert, IG = +HypIG, HM = +HypM, T = TurnIG, S = Strict.

Theorems & Definitions (2)

  • Theorem 1
  • proof