Table of Contents
Fetching ...

Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier

Zachary Wojtowicz, Simon DeDeo

TL;DR

The paper argues that cheap AI-enabled thinking can undermine social cooperation by eroding mental proofs—observable actions that certify unobservable mental states like knowledge and intentions—in low-trust settings. It formalizes mental proof through two mechanisms—signaling theory and proofs of knowledge—and uses worked examples such as sincere apologies and social proof to illustrate how AI disrupts trust-building processes. It shows that AI can flatten cost differentials or enable deceptive replicas, weakening coordination and collective action, especially for those without strong institutions. The authors propose policy and design responses, including distinguishing AI-assisted from human-authored content and developing trust-enhancing protocols to preserve the social value of mental proofs while leveraging AI’s benefits.

Abstract

Large language models and other highly capable AI systems ease the burdens of deciding what to say or do, but this very ease can undermine the effectiveness of our actions in social contexts. We explain this apparent tension by introducing the integrative theoretical concept of "mental proof," which occurs when observable actions are used to certify unobservable mental facts. From hiring to dating, mental proofs enable people to credibly communicate values, intentions, states of knowledge, and other private features of their minds to one another in low-trust environments where honesty cannot be easily enforced. Drawing on results from economics, theoretical biology, and computer science, we describe the core theoretical mechanisms that enable people to effect mental proofs. An analysis of these mechanisms clarifies when and how artificial intelligence can make low-trust cooperation harder despite making thinking easier.

Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier

TL;DR

The paper argues that cheap AI-enabled thinking can undermine social cooperation by eroding mental proofs—observable actions that certify unobservable mental states like knowledge and intentions—in low-trust settings. It formalizes mental proof through two mechanisms—signaling theory and proofs of knowledge—and uses worked examples such as sincere apologies and social proof to illustrate how AI disrupts trust-building processes. It shows that AI can flatten cost differentials or enable deceptive replicas, weakening coordination and collective action, especially for those without strong institutions. The authors propose policy and design responses, including distinguishing AI-assisted from human-authored content and developing trust-enhancing protocols to preserve the social value of mental proofs while leveraging AI’s benefits.

Abstract

Large language models and other highly capable AI systems ease the burdens of deciding what to say or do, but this very ease can undermine the effectiveness of our actions in social contexts. We explain this apparent tension by introducing the integrative theoretical concept of "mental proof," which occurs when observable actions are used to certify unobservable mental facts. From hiring to dating, mental proofs enable people to credibly communicate values, intentions, states of knowledge, and other private features of their minds to one another in low-trust environments where honesty cannot be easily enforced. Drawing on results from economics, theoretical biology, and computer science, we describe the core theoretical mechanisms that enable people to effect mental proofs. An analysis of these mechanisms clarifies when and how artificial intelligence can make low-trust cooperation harder despite making thinking easier.
Paper Structure (10 sections, 1 figure, 1 table)

This paper contains 10 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Two focal categories of signaling equilibria. In each, one "active" type of agent engages in a signaling behavior while the other "passive" type does not. To sustain a separating equilibrium, it must be that acting generates a net gain for the active type but a net loss for the passive type. In the first category of equilibrium (sub-figure a), both agent types receive the same benefit from engaging in a behavior. The fact that one type acts but the other does not implies that they incur different costs. In the second (sub-figure b), both types incur the same cost. Here, the fact that only one type acts implies that they expect different benefits.