Table of Contents
Fetching ...

The 20 questions game to distinguish large language models

Gurvan Richardeau, Erwan Le Merrer, Camilla Penzo, Gilles Tredan

TL;DR

This work formalizes the problem of distinguishing large language models (LLMs) in a black-box setting using a small set of benign binary prompts. It establishes optimal bounds on the number of questions needed to tell two distinct models apart and introduces two practical heuristics—the Separability and Recursive Similarity methods—that dramatically reduce required prompts while maintaining high accuracy. Empirical results across 22 models show that these heuristics achieve near-identification with roughly half the questions compared to random prompting, highlighting their potential for auditing, intellectual-property protection, and model-leakage investigations. The approach provides a principled framework for binary-question based model differentiation with implications for security, forensics, and policy-compliance in AI deployments.

Abstract

In a parallel with the 20 questions game, we present a method to determine whether two large language models (LLMs), placed in a black-box context, are the same or not. The goal is to use a small set of (benign) binary questions, typically under 20. We formalize the problem and first establish a baseline using a random selection of questions from known benchmark datasets, achieving an accuracy of nearly 100% within 20 questions. After showing optimal bounds for this problem, we introduce two effective questioning heuristics able to discriminate 22 LLMs by using half as many questions for the same task. These methods offer significant advantages in terms of stealth and are thus of interest to auditors or copyright owners facing suspicions of model leaks.

The 20 questions game to distinguish large language models

TL;DR

This work formalizes the problem of distinguishing large language models (LLMs) in a black-box setting using a small set of benign binary prompts. It establishes optimal bounds on the number of questions needed to tell two distinct models apart and introduces two practical heuristics—the Separability and Recursive Similarity methods—that dramatically reduce required prompts while maintaining high accuracy. Empirical results across 22 models show that these heuristics achieve near-identification with roughly half the questions compared to random prompting, highlighting their potential for auditing, intellectual-property protection, and model-leakage investigations. The approach provides a principled framework for binary-question based model differentiation with implications for security, forensics, and policy-compliance in AI deployments.

Abstract

In a parallel with the 20 questions game, we present a method to determine whether two large language models (LLMs), placed in a black-box context, are the same or not. The goal is to use a small set of (benign) binary questions, typically under 20. We formalize the problem and first establish a baseline using a random selection of questions from known benchmark datasets, achieving an accuracy of nearly 100% within 20 questions. After showing optimal bounds for this problem, we introduce two effective questioning heuristics able to discriminate 22 LLMs by using half as many questions for the same task. These methods offer significant advantages in terms of stealth and are thus of interest to auditors or copyright owners facing suspicions of model leaks.
Paper Structure (13 sections, 3 theorems, 9 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 13 sections, 3 theorems, 9 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem B.1

Let $M$ be a finite set such that $|M| = L$. A question that maximally differentiates pairs in $M$ is one that splits the set into the most equal groups.

Figures (4)

  • Figure 1: (plain/purple) Distribution of correct answers for questions in $K$, compared to (dashed/green) a random binomial model. Vertical line: average of both distributions.
  • Figure 2: Map $S: k \mapsto \max_{Q \in \mathcal{Q}, |Q| = k} \text{acc}(Q)$, for all questions datasets combined and the 22 LLMs. Each heuristic has been run 2000 times, and we present the mean, std, best and worst cases (based on AUC).
  • Figure 3: Scalability: number of questions $|Q|$ needed to distinguish 99% of the pairs of a set of $|M|$ models, for the Separability heuristic and the optimal case.
  • Figure 4: Proximity (t-sne on response vectors) of the 22 LLMs. Those of same family i.e. from same company and differing by version appear globally close, except the DeciLM models.

Theorems & Definitions (9)

  • Definition 2.1: Distinguishing LLMs
  • Definition 5.1: Subset Separability
  • Definition 5.2: Similarity of two partitions of same separability
  • Theorem B.1
  • proof
  • Lemma B.2
  • proof
  • Theorem B.3
  • proof