Table of Contents
Fetching ...

Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)

Brecht Verbeken, Brando Vagenende, Marie-Anne Guerry, Andres Algaba, Vincent Ginis

TL;DR

The paper investigates whether consumer-access LLMs can meaningfully contribute to auditable, research-level mathematics. It centers a case study on the nonreal spectral region of a $4\times4$ row-stochastic matrix family with cyclic zero pattern, using a Dmitriev–Dynkin–style trigonometric reduction and a rigorous, human-in-the-loop verification workflow to prove Ran and Teng's Conjecture 20. The main contribution is a complete characterization of nonreal eigenvalues: necessary and sufficient conditions $0\le a\le 1$, $a+|b|\le 1$, and $G(a,|b|)\ge 0$, plus explicit boundary attainments along the segment $C_R$ and the curve $G(a,b)=0$ realized by $A_L(\alpha)$. Beyond the result itself, the paper provides a detailed, auditable process for AI-assisted theorem proving, highlighting the strengths of LLMs in proposing global structure and the indispensable role of human verification in correctness-critical steps, and it discusses a hybrid path toward formal verification via proof assistants. The work demonstrates that, with structured prompts, versioned artefacts, and controlled patching, consumer LLMs can contribute meaningfully to mathematical discovery and documentation, while also pointing to bottlenecks in long algebraic expansions and the potential integration with formal tooling for stronger guarantees.

Abstract

Large Language Models (LLMs) are increasingly used as scientific copilots, but evidence on their role in research-level mathematics remains limited, especially for workflows accessible to individual researchers. We present early evidence for vibe-proving with a consumer subscription LLM through an auditable case study that resolves Conjecture 20 of Ran and Teng (2024) on the exact nonreal spectral region of a 4-cycle row-stochastic nonnegative matrix family. We analyze seven shareable ChatGPT-5.2 (Thinking) threads and four versioned proof drafts, documenting an iterative pipeline of generate, referee, and repair. The model is most useful for high-level proof search, while human experts remain essential for correctness-critical closure. The final theorem provides necessary and sufficient region conditions and explicit boundary attainment constructions. Beyond the mathematical result, we contribute a process-level characterization of where LLM assistance materially helps and where verification bottlenecks persist, with implications for evaluation of AI-assisted research workflows and for designing human-in-the-loop theorem proving systems.

Early Evidence of Vibe-Proving with Consumer LLMs: A Case Study on Spectral Region Characterization with ChatGPT-5.2 (Thinking)

TL;DR

The paper investigates whether consumer-access LLMs can meaningfully contribute to auditable, research-level mathematics. It centers a case study on the nonreal spectral region of a row-stochastic matrix family with cyclic zero pattern, using a Dmitriev–Dynkin–style trigonometric reduction and a rigorous, human-in-the-loop verification workflow to prove Ran and Teng's Conjecture 20. The main contribution is a complete characterization of nonreal eigenvalues: necessary and sufficient conditions , , and , plus explicit boundary attainments along the segment and the curve realized by . Beyond the result itself, the paper provides a detailed, auditable process for AI-assisted theorem proving, highlighting the strengths of LLMs in proposing global structure and the indispensable role of human verification in correctness-critical steps, and it discusses a hybrid path toward formal verification via proof assistants. The work demonstrates that, with structured prompts, versioned artefacts, and controlled patching, consumer LLMs can contribute meaningfully to mathematical discovery and documentation, while also pointing to bottlenecks in long algebraic expansions and the potential integration with formal tooling for stronger guarantees.

Abstract

Large Language Models (LLMs) are increasingly used as scientific copilots, but evidence on their role in research-level mathematics remains limited, especially for workflows accessible to individual researchers. We present early evidence for vibe-proving with a consumer subscription LLM through an auditable case study that resolves Conjecture 20 of Ran and Teng (2024) on the exact nonreal spectral region of a 4-cycle row-stochastic nonnegative matrix family. We analyze seven shareable ChatGPT-5.2 (Thinking) threads and four versioned proof drafts, documenting an iterative pipeline of generate, referee, and repair. The model is most useful for high-level proof search, while human experts remain essential for correctness-critical closure. The final theorem provides necessary and sufficient region conditions and explicit boundary attainment constructions. Beyond the mathematical result, we contribute a process-level characterization of where LLM assistance materially helps and where verification bottlenecks persist, with implications for evaluation of AI-assisted research workflows and for designing human-in-the-loop theorem proving systems.
Paper Structure (20 sections, 20 theorems, 292 equations)

This paper contains 20 sections, 20 theorems, 292 equations.

Key Result

Theorem 1

Let Let $\lambda=a+ib\in\sigma(A(\alpha,\beta,\gamma,\delta))$ with $b\neq 0$, and write $b_+=|b|$. Then: Conversely, if $b>0$ and then there exist $\alpha,\beta,\gamma,\delta\in[0,1]$ such that $\lambda\in\sigma(A(\alpha,\beta,\gamma,\delta))$. (The lower half-plane follows by conjugation.) Moreover:

Theorems & Definitions (42)

  • Conjecture 20: Ran and Teng, 2024 ran2024nonnegative, Conjecture 20
  • Theorem : Spectral region for a 4-cycle row-stochastic matrix
  • Theorem 21: Precise spectral region for a 4-cycle row-stochastic matrix
  • proof
  • Lemma 1: Log-modulus/argument reformulation
  • proof
  • Lemma 2: Unbounded supremum
  • proof
  • Lemma 3: Finite maximum in the tight regime
  • proof
  • ...and 32 more