Table of Contents
Fetching ...

SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models

Adar Avsian, Larry Heck

Abstract

Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction following, and do not directly measure strategic communication under asymmetric information. We introduce SNEAK (Secret-aware Natural language Evaluation for Adversarial Knowledge), a benchmark for evaluating selective information sharing in language models. In SNEAK, a model is given a semantic category, a candidate set of words, and a secret word, and must generate a message that indicates knowledge of the secret without revealing it too clearly. We evaluate generated messages using two simulated agents with different information states: an ally, who knows the secret and must identify the intended message, and a chameleon, who does not know the secret and attempts to infer it from the message. This yields two complementary metrics: utility, measuring how well the message communicates to collaborators, and leakage, measuring how much information it reveals to an adversary. Using this framework, we analyze the trade-off between informativeness and secrecy in modern language models and show that strategic communication under asymmetric information remains a challenging capability for current systems. Notably, human participants outperform all evaluated models by a large margin, achieving up to four times higher scores.

SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models

Abstract

Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction following, and do not directly measure strategic communication under asymmetric information. We introduce SNEAK (Secret-aware Natural language Evaluation for Adversarial Knowledge), a benchmark for evaluating selective information sharing in language models. In SNEAK, a model is given a semantic category, a candidate set of words, and a secret word, and must generate a message that indicates knowledge of the secret without revealing it too clearly. We evaluate generated messages using two simulated agents with different information states: an ally, who knows the secret and must identify the intended message, and a chameleon, who does not know the secret and attempts to infer it from the message. This yields two complementary metrics: utility, measuring how well the message communicates to collaborators, and leakage, measuring how much information it reveals to an adversary. Using this framework, we analyze the trade-off between informativeness and secrecy in modern language models and show that strategic communication under asymmetric information remains a challenging capability for current systems. Notably, human participants outperform all evaluated models by a large margin, achieving up to four times higher scores.

Paper Structure

This paper contains 43 sections, 12 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Overview of the SNEAK benchmark. A message-generating language model (left) plays the role of an ally by observing the category, candidate set, and secret word and produces a message. A second ally (center), who knows the secret, hears the message and must determine whether it corresponds to the secret. The chameleon (right), who does not know the secret, hears the message and must infer the secret word from the candidate set.
  • Figure 2: Utility–leakage trade-off for a single instance (secret: hat). The message rabbit (as in a rabbit in a hat) achieves high utility while maintaining low leakage.
  • Figure 3: Utility--leakage trade-off across message-generating models. Each point corresponds to a baseline, with marker size proportional to SoftScore. The dashed curve shows the empirical Pareto frontier. The upper-left region corresponds to desirable behavior: high utility for the ally and low leakage to the chameleon.
  • Figure 4: Sensitivity of SNEAK performance for Gemma-3-27B to candidate set size ($|W|$) and number of decoy messages ($|M|$)
  • Figure 5: Human annotation interface for message generation. Annotators are given the category, candidate set, and secret word, and are asked to produce a 1-5 word message that signals knowledge of the secret without revealing it directly.
  • ...and 4 more figures