Table of Contents
Fetching ...

Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game

Mustafa O. Karabag, Jan Sobotka, Ufuk Topcu

TL;DR

This paper analyzes how LLM-based agents manage information in strategic, non-cooperative settings using The Chameleon, a language-based hidden-identity game. It blends theory and experiments to show that non-chameleon LLMs tend to reveal information, enabling chameleons to infer secrets, and that pure instructional prompts are insufficient for robust concealment. The authors establish bounds for stationary revealing and concealing strategies and demonstrate an achievable history-dependent strategy yielding $\mathcal{O}(\log(N)/N)$ non-chameleon wins, with hidden-state steering further enhancing concealment beyond instruction. The work highlights latent strategic capabilities in LLMs, demonstrates how internal representations encode information-revealing levels, and proposes representation-level interventions as a path to more reliable strategic behavior in multi-agent AI systems.

Abstract

Large language model-based (LLM-based) agents have become common in settings that include non-cooperative parties. In such settings, agents' decision-making needs to conceal information from their adversaries, reveal information to their cooperators, and infer information to identify the other agents' characteristics. To investigate whether LLMs have these information control and decision-making capabilities, we make LLM agents play the language-based hidden-identity game, The Chameleon. In this game, a group of non-chameleon agents who do not know each other aim to identify the chameleon agent without revealing a secret. The game requires the aforementioned information control capabilities both as a chameleon and a non-chameleon. We begin with a theoretical analysis for a spectrum of strategies, from concealing to revealing, and provide bounds on the non-chameleons' winning probability. The empirical results with GPT, Gemini 2.5 Pro, Llama 3.1, and Qwen3 models show that while non-chameleon LLM agents identify the chameleon, they fail to conceal the secret from the chameleon, and their winning probability is far from the levels of even trivial strategies. Based on these empirical results and our theoretical analysis, we deduce that LLM-based agents may reveal excessive information to agents of unknown identities. Interestingly, we find that, when instructed to adopt an information-revealing level, this level is linearly encoded in the LLM's internal representations. While the instructions alone are often ineffective at making non-chameleon LLMs conceal, we show that steering the internal representations in this linear direction directly can reliably induce concealing behavior.

Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game

TL;DR

This paper analyzes how LLM-based agents manage information in strategic, non-cooperative settings using The Chameleon, a language-based hidden-identity game. It blends theory and experiments to show that non-chameleon LLMs tend to reveal information, enabling chameleons to infer secrets, and that pure instructional prompts are insufficient for robust concealment. The authors establish bounds for stationary revealing and concealing strategies and demonstrate an achievable history-dependent strategy yielding non-chameleon wins, with hidden-state steering further enhancing concealment beyond instruction. The work highlights latent strategic capabilities in LLMs, demonstrates how internal representations encode information-revealing levels, and proposes representation-level interventions as a path to more reliable strategic behavior in multi-agent AI systems.

Abstract

Large language model-based (LLM-based) agents have become common in settings that include non-cooperative parties. In such settings, agents' decision-making needs to conceal information from their adversaries, reveal information to their cooperators, and infer information to identify the other agents' characteristics. To investigate whether LLMs have these information control and decision-making capabilities, we make LLM agents play the language-based hidden-identity game, The Chameleon. In this game, a group of non-chameleon agents who do not know each other aim to identify the chameleon agent without revealing a secret. The game requires the aforementioned information control capabilities both as a chameleon and a non-chameleon. We begin with a theoretical analysis for a spectrum of strategies, from concealing to revealing, and provide bounds on the non-chameleons' winning probability. The empirical results with GPT, Gemini 2.5 Pro, Llama 3.1, and Qwen3 models show that while non-chameleon LLM agents identify the chameleon, they fail to conceal the secret from the chameleon, and their winning probability is far from the levels of even trivial strategies. Based on these empirical results and our theoretical analysis, we deduce that LLM-based agents may reveal excessive information to agents of unknown identities. Interestingly, we find that, when instructed to adopt an information-revealing level, this level is linearly encoded in the LLM's internal representations. While the instructions alone are often ineffective at making non-chameleon LLMs conceal, we show that steering the internal representations in this linear direction directly can reliably induce concealing behavior.

Paper Structure

This paper contains 27 sections, 4 theorems, 13 equations, 9 figures, 5 tables.

Key Result

Proposition 1

For every $\alpha$-KL pairwise concealing non-chameleon strategy $\pi^{non}$, there exists a chameleon strategy $\pi^{\textrm{ch}}$ such that

Figures (9)

  • Figure 1: An example gameplay. In this example, the non-chameleons (blue players) correctly identify the chameleon (red player), but the chameleon wins in the second chance.
  • Figure 2: Bounds on the winning probability of non-chameleons. The non-chameleons lose the game with high probability if they use revealing or concealing strategies: The chameleon correctly identifies the secret word for revealing strategies, and the non-chameleons misidentify the chameleon for concealing strategies. The non-chameleons can win the game with a probability that is $\mathcal{O}(\log(N))$ times than the trivial $0$-KL pairwise concealing strategy.
  • Figure 3: An example gameplay under $\pi^{amb}$ including the chameleon's response. The table shows the posterior probabilities of potential secret words given the responses under $\pi^{amb}$ without the knowledge of the secret word. The chameleon (red player) gives a response that eliminates the secret word $w_{5}$ and is not consistent with $\pi^{amb}$. Consequently, the non-chameleons (blue players) certainly identify the chameleon. The chameleon has a chance of winning the game with probability $1/2$ as it knows that the secret word is a word that it eliminated, $w_{5}$ or $w_{10}$.
  • Figure 4: Accuracy of the GPT-4.1 chameleon in guessing the secret word based on the response words of GPT-4.1 non-chameleons. With the permuted order, the chameleon was presented with response words that did not match the original sequential responses of non-chameleons (e.g., one response word from the third instead of the first player). Evaluation done with response words from 100 games.
  • Figure 5: Principal component analysis of the hidden states of Llama 3.1 70B non-chameleon agent. Upper: Hidden states when the LLM agent is instructed to be at a specific information-revealing level. The steering vector is rescaled and shifted for illustration purposes. Lower: Hidden states of the LLM in a standard gameplay with no instructions (steering strength 0) and in a gameplay with no instructions but steered hidden states (steering strengths 3 and -3).
  • ...and 4 more figures

Theorems & Definitions (9)

  • Remark 1
  • Remark 2
  • Remark 3
  • Definition 1
  • Proposition 1
  • Proposition 2
  • Definition 2
  • Proposition 3
  • Proposition 4