Table of Contents
Fetching ...

Two Agents, One Prompt, and Your Weight

Elchanan Mossel, Amnon Schreiber

TL;DR

This work extends the classical Two Doors puzzle to a non-binary, quantitative setting by focusing on recovering a true weight $w$ with a single question to one of two agents (truth-teller and liar). It analyzes both probabilistic and non-probabilistic liar models and shows that carefully crafted self-referential prompts can force correct answers under minimal assumptions, either by leveraging fixed liar behavior or by reducing the problem to a binary outcome. The key contributions include a solvable variant with a fixed lying function (e.g., $w\mapsto w+10$), a probabilistic setting with liar distributions, and a no-probabilistic-assumption approach based on binary reduction, along with connections to prompt engineering and an evaluation of ChatGPT as a reasoning aid. The findings have implications for information extraction in contexts where agent truthfulness is uncertain and demonstrate how prompt design can harness structural properties of self-reference and belief about others to recover exact quantitative facts.

Abstract

We investigate a quantitative variant of the classic Two Doors logic puzzle, in which the answer space is no longer binary, for example when the goal is to recover a numerical fact (such as one's true weight) rather than choose between two doors. The puzzle retains the original structure: one agent always tells the truth, the other always lies. Our central contribution is to identify a class of self-referential prompts that successfully extract the correct quantitative answer under minimal assumptions. We also explore how well does \texttt{ChatGPT} does in reasoning for this problem which is just a little bit out of distribution.

Two Agents, One Prompt, and Your Weight

TL;DR

This work extends the classical Two Doors puzzle to a non-binary, quantitative setting by focusing on recovering a true weight with a single question to one of two agents (truth-teller and liar). It analyzes both probabilistic and non-probabilistic liar models and shows that carefully crafted self-referential prompts can force correct answers under minimal assumptions, either by leveraging fixed liar behavior or by reducing the problem to a binary outcome. The key contributions include a solvable variant with a fixed lying function (e.g., ), a probabilistic setting with liar distributions, and a no-probabilistic-assumption approach based on binary reduction, along with connections to prompt engineering and an evaluation of ChatGPT as a reasoning aid. The findings have implications for information extraction in contexts where agent truthfulness is uncertain and demonstrate how prompt design can harness structural properties of self-reference and belief about others to recover exact quantitative facts.

Abstract

We investigate a quantitative variant of the classic Two Doors logic puzzle, in which the answer space is no longer binary, for example when the goal is to recover a numerical fact (such as one's true weight) rather than choose between two doors. The puzzle retains the original structure: one agent always tells the truth, the other always lies. Our central contribution is to identify a class of self-referential prompts that successfully extract the correct quantitative answer under minimal assumptions. We also explore how well does \texttt{ChatGPT} does in reasoning for this problem which is just a little bit out of distribution.

Paper Structure

This paper contains 22 sections, 2 equations.