Judgment-of-Thought Prompting: A Courtroom-Inspired Framework for Binary Logical Reasoning with Large Language Models
Sungjune Park, Heehwan Kim, Haehyun Cho, Daeseon Choi
TL;DR
JoT tackles binary logical reasoning in LLMs by introducing a courtroom-inspired three-role prompting framework (lawyer, prosecutor, judge) that enables adversarial yet structured debate and iterative refinement. The high-level judge evaluates argument quality from lower-level lawyers, yielding improved accuracy, consistency, and interpretability across diverse tasks. Empirical results on BigBenchHard and Winogrande show strong gains (e.g., 98% on Boolean Expressions, 90% on Web of Lies, 89% on Winogrande) and ablations confirm the necessity of each role and the iterative feedback loop. The work suggests JoT's potential for reliable decision-making in real-world domains, with future directions including domain-specific retrieval augmentation and efficiency improvements.
Abstract
This paper proposes a novel prompting approach, Judgment of Thought (JoT), specifically tailored for binary logical reasoning tasks. Despite advances in prompt engineering, existing approaches still face limitations in handling complex logical reasoning tasks. To address these issues, JoT introduces a multi-agent approach with three specialized roles$\unicode{x2010}$$\unicode{x2010}$$\unicode{x2010}$lawyer, prosecutor, and judge$\unicode{x2010}$$\unicode{x2010}$$\unicode{x2010}$where a high-level model acts as the judge, and lower-level models serve as lawyer and prosecutor to systematically debate and evaluate arguments. Experimental evaluations on benchmarks such as BigBenchHard and Winogrande demonstrate JoT's superior performance compared to existing prompting approaches, achieving notable improvements, including 98\% accuracy in Boolean expressions. Also, our ablation studies validate the critical contribution of each role, iterative refinement loops, and feedback mechanisms. Consequently, JoT significantly enhances accuracy, reliability, and consistency in binary reasoning tasks and shows potential for practical applications.
