GRATR: Zero-Shot Evidence Graph Retrieval-Augmented Trustworthiness Reasoning
Ying Zhu, Shengchang Li, Ziqian Kong, Qiang Yang, Peilan Xu
TL;DR
GRATR addresses trustworthiness reasoning in incomplete-information settings by combining a dynamically updated trustworthiness graph with multi-hop retrieval to augment LLM reasoning in a zero-shot manner. The framework initializes and maintains a graph where nodes represent players and edges encode observed interactions and trust levels, updating these relations as new observations arrive. Through forward retrieval, backward updates, and reasoning, GRATR aggregates evidence chains from trusted sources to refine trust assessments toward a target and informs LLM-driven decisions with transparent, time-stamped rationale. Empirical results in the Werewolf game show GRATR outperforms baselines in win rate and reduces hallucinations, while a Twitter intent analysis benchmark demonstrates superior accuracy and macro F1, indicating robust applicability to real-world, language-rich domains.
Abstract
Trustworthiness reasoning aims to enable agents in multiplayer games with incomplete information to identify potential allies and adversaries, thereby enhancing decision-making. In this paper, we introduce the graph retrieval-augmented trustworthiness reasoning (GRATR) framework, which retrieves observable evidence from the game environment to inform decision-making by large language models (LLMs) without requiring additional training, making it a zero-shot approach. Within the GRATR framework, agents first observe the actions of other players and evaluate the resulting shifts in inter-player trust, constructing a corresponding trustworthiness graph. During decision-making, the agent performs multi-hop retrieval to evaluate trustworthiness toward a specific target, where evidence chains are retrieved from multiple trusted sources to form a comprehensive assessment. Experiments in the multiplayer game \emph{Werewolf} demonstrate that GRATR outperforms the alternatives, improving reasoning accuracy by 50.5\% and reducing hallucination by 30.6\% compared to the baseline method. Additionally, when tested on a dataset of Twitter tweets during the U.S. election period, GRATR surpasses the baseline method by 10.4\% in accuracy, highlighting its potential in real-world applications such as intent analysis.
