The 20 questions game to distinguish large language models
Gurvan Richardeau, Erwan Le Merrer, Camilla Penzo, Gilles Tredan
TL;DR
This work formalizes the problem of distinguishing large language models (LLMs) in a black-box setting using a small set of benign binary prompts. It establishes optimal bounds on the number of questions needed to tell two distinct models apart and introduces two practical heuristics—the Separability and Recursive Similarity methods—that dramatically reduce required prompts while maintaining high accuracy. Empirical results across 22 models show that these heuristics achieve near-identification with roughly half the questions compared to random prompting, highlighting their potential for auditing, intellectual-property protection, and model-leakage investigations. The approach provides a principled framework for binary-question based model differentiation with implications for security, forensics, and policy-compliance in AI deployments.
Abstract
In a parallel with the 20 questions game, we present a method to determine whether two large language models (LLMs), placed in a black-box context, are the same or not. The goal is to use a small set of (benign) binary questions, typically under 20. We formalize the problem and first establish a baseline using a random selection of questions from known benchmark datasets, achieving an accuracy of nearly 100% within 20 questions. After showing optimal bounds for this problem, we introduce two effective questioning heuristics able to discriminate 22 LLMs by using half as many questions for the same task. These methods offer significant advantages in terms of stealth and are thus of interest to auditors or copyright owners facing suspicions of model leaks.
