Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)

Brecht Verbeken; Brando Vagenende; Marie-Anne Guerry; Andres Algaba; Vincent Ginis

Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)

Brecht Verbeken, Brando Vagenende, Marie-Anne Guerry, Andres Algaba, Vincent Ginis

TL;DR

The paper investigates whether consumer-access LLMs can meaningfully contribute to auditable, research-level mathematics. It centers a case study on the nonreal spectral region of a $4\times4$ row-stochastic matrix family with cyclic zero pattern, using a Dmitriev–Dynkin–style trigonometric reduction and a rigorous, human-in-the-loop verification workflow to prove Ran and Teng's Conjecture 20. The main contribution is a complete characterization of nonreal eigenvalues: necessary and sufficient conditions $0\le a\le 1$, $a+|b|\le 1$, and $G(a,|b|)\ge 0$, plus explicit boundary attainments along the segment $C_R$ and the curve $G(a,b)=0$ realized by $A_L(\alpha)$. Beyond the result itself, the paper provides a detailed, auditable process for AI-assisted theorem proving, highlighting the strengths of LLMs in proposing global structure and the indispensable role of human verification in correctness-critical steps, and it discusses a hybrid path toward formal verification via proof assistants. The work demonstrates that, with structured prompts, versioned artefacts, and controlled patching, consumer LLMs can contribute meaningfully to mathematical discovery and documentation, while also pointing to bottlenecks in long algebraic expansions and the potential integration with formal tooling for stronger guarantees.

Abstract

Large Language Models (LLMs) are increasingly used as scientific copilots, but evidence on their role in research-level mathematics remains limited, especially for workflows accessible to individual researchers. We present early evidence for vibe-proving with a consumer subscription LLM through an auditable case study that resolves Conjecture 20 of Ran and Teng (2024) on the exact nonreal spectral region of a 4-cycle row-stochastic nonnegative matrix family. We analyze seven shareable ChatGPT-5.2 (Thinking) threads and four versioned proof drafts, documenting an iterative pipeline of generate, referee, and repair. The model is most useful for high-level proof search, while human experts remain essential for correctness-critical closure. The final theorem provides necessary and sufficient region conditions and explicit boundary attainment constructions. Beyond the mathematical result, we contribute a process-level characterization of where LLM assistance materially helps and where verification bottlenecks persist, with implications for evaluation of AI-assisted research workflows and for designing human-in-the-loop theorem proving systems.

Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)

TL;DR

The paper investigates whether consumer-access LLMs can meaningfully contribute to auditable, research-level mathematics. It centers a case study on the nonreal spectral region of a

row-stochastic matrix family with cyclic zero pattern, using a Dmitriev–Dynkin–style trigonometric reduction and a rigorous, human-in-the-loop verification workflow to prove Ran and Teng's Conjecture 20. The main contribution is a complete characterization of nonreal eigenvalues: necessary and sufficient conditions

, and

, plus explicit boundary attainments along the segment

and the curve

realized by

. Beyond the result itself, the paper provides a detailed, auditable process for AI-assisted theorem proving, highlighting the strengths of LLMs in proposing global structure and the indispensable role of human verification in correctness-critical steps, and it discusses a hybrid path toward formal verification via proof assistants. The work demonstrates that, with structured prompts, versioned artefacts, and controlled patching, consumer LLMs can contribute meaningfully to mathematical discovery and documentation, while also pointing to bottlenecks in long algebraic expansions and the potential integration with formal tooling for stronger guarantees.

Abstract

Paper Structure (20 sections, 20 theorems, 292 equations)

This paper contains 20 sections, 20 theorems, 292 equations.

Introduction
Mathematical Background
Spectral Regions of Stochastic Matrices
Dimitriev and Dynkin's Trigonometric Method
Conjecture 20 of Ran and Teng
Contribution
The Conversations with ChatGPT
Casual testing of GPT's capabilities
Setting Up the Theorem
The Proof Strategy
The Iterations
From chronology to workflow-level lessons
Discussion
What This Case Suggests
Workflow and Verification Strategy
...and 5 more sections

Key Result

Theorem 1

Let Let $\lambda=a+ib\in\sigma(A(\alpha,\beta,\gamma,\delta))$ with $b\neq 0$, and write $b_+=|b|$. Then: Conversely, if $b>0$ and then there exist $\alpha,\beta,\gamma,\delta\in[0,1]$ such that $\lambda\in\sigma(A(\alpha,\beta,\gamma,\delta))$. (The lower half-plane follows by conjugation.) Moreover:

Theorems & Definitions (42)

Conjecture 20: Ran and Teng, 2024 ran2024nonnegative, Conjecture 20
Theorem : Spectral region for a 4-cycle row-stochastic matrix
Theorem 21: Precise spectral region for a 4-cycle row-stochastic matrix
proof
Lemma 1: Log-modulus/argument reformulation
proof
Lemma 2: Unbounded supremum
proof
Lemma 3: Finite maximum in the tight regime
proof
...and 32 more

Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)

TL;DR

Abstract

Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (42)