Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game

Mustafa O. Karabag; Jan Sobotka; Ufuk Topcu

Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game

Mustafa O. Karabag, Jan Sobotka, Ufuk Topcu

TL;DR

This paper analyzes how LLM-based agents manage information in strategic, non-cooperative settings using The Chameleon, a language-based hidden-identity game. It blends theory and experiments to show that non-chameleon LLMs tend to reveal information, enabling chameleons to infer secrets, and that pure instructional prompts are insufficient for robust concealment. The authors establish bounds for stationary revealing and concealing strategies and demonstrate an achievable history-dependent strategy yielding $\mathcal{O}(\log(N)/N)$ non-chameleon wins, with hidden-state steering further enhancing concealment beyond instruction. The work highlights latent strategic capabilities in LLMs, demonstrates how internal representations encode information-revealing levels, and proposes representation-level interventions as a path to more reliable strategic behavior in multi-agent AI systems.

Abstract

Large language model-based (LLM-based) agents have become common in settings that include non-cooperative parties. In such settings, agents' decision-making needs to conceal information from their adversaries, reveal information to their cooperators, and infer information to identify the other agents' characteristics. To investigate whether LLMs have these information control and decision-making capabilities, we make LLM agents play the language-based hidden-identity game, The Chameleon. In this game, a group of non-chameleon agents who do not know each other aim to identify the chameleon agent without revealing a secret. The game requires the aforementioned information control capabilities both as a chameleon and a non-chameleon. We begin with a theoretical analysis for a spectrum of strategies, from concealing to revealing, and provide bounds on the non-chameleons' winning probability. The empirical results with GPT, Gemini 2.5 Pro, Llama 3.1, and Qwen3 models show that while non-chameleon LLM agents identify the chameleon, they fail to conceal the secret from the chameleon, and their winning probability is far from the levels of even trivial strategies. Based on these empirical results and our theoretical analysis, we deduce that LLM-based agents may reveal excessive information to agents of unknown identities. Interestingly, we find that, when instructed to adopt an information-revealing level, this level is linearly encoded in the LLM's internal representations. While the instructions alone are often ineffective at making non-chameleon LLMs conceal, we show that steering the internal representations in this linear direction directly can reliably induce concealing behavior.

Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game

TL;DR

Abstract

Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (9)