Table of Contents
Fetching ...

Certainty-Validity: A Diagnostic Framework for Discrete Commitment Systems

Datorien L. Anderson

TL;DR

This framework reveals a critical failure mode hidden by standard accuracy: Confident-Incorrect (CI) behavior, where models hallucinate structure in ambiguous data, and proposes that "good training" for reasoning systems must be defined not by accuracy, but by maximizing the Certainty-Validity Score (CVS) -- ensuring the model knows where to stop.

Abstract

Standard evaluation metrics for machine learning -- accuracy, precision, recall, and AUROC -- assume that all errors are equivalent: a confident incorrect prediction is penalized identically to an uncertain one. For discrete commitment systems (architectures that select committed states {-W, 0, +W}), this assumption is epistemologically flawed. We introduce the Certainty-Validity (CVS) Framework, a diagnostic method that decomposes model performance into a 2x2 matrix distinguishing high/low certainty from valid/invalid predictions. This framework reveals a critical failure mode hidden by standard accuracy: Confident-Incorrect (CI) behavior, where models hallucinate structure in ambiguous data. Through ablation experiments on Fashion-MNIST, EMNIST, and IMDB, we analyze the "83% Ambiguity Ceiling" -- a stopping point where this specific discrete architecture consistently plateaus on noisy benchmarks. Unlike continuous models that can surpass this ceiling by memorizing texture or statistical noise, the discrete model refuses to commit to ambiguous samples. We show that this refusal is not a failure but a feature: the model stops where structural evidence ends. However, standard training on ambiguous data eventually forces Benign Overfitting, causing a pathological migration from Uncertain-Incorrect (appropriate doubt) to Confident-Incorrect (hallucination). We propose that "good training" for reasoning systems must be defined not by accuracy, but by maximizing the Certainty-Validity Score (CVS) -- ensuring the model knows where to stop.

Certainty-Validity: A Diagnostic Framework for Discrete Commitment Systems

TL;DR

This framework reveals a critical failure mode hidden by standard accuracy: Confident-Incorrect (CI) behavior, where models hallucinate structure in ambiguous data, and proposes that "good training" for reasoning systems must be defined not by accuracy, but by maximizing the Certainty-Validity Score (CVS) -- ensuring the model knows where to stop.

Abstract

Standard evaluation metrics for machine learning -- accuracy, precision, recall, and AUROC -- assume that all errors are equivalent: a confident incorrect prediction is penalized identically to an uncertain one. For discrete commitment systems (architectures that select committed states {-W, 0, +W}), this assumption is epistemologically flawed. We introduce the Certainty-Validity (CVS) Framework, a diagnostic method that decomposes model performance into a 2x2 matrix distinguishing high/low certainty from valid/invalid predictions. This framework reveals a critical failure mode hidden by standard accuracy: Confident-Incorrect (CI) behavior, where models hallucinate structure in ambiguous data. Through ablation experiments on Fashion-MNIST, EMNIST, and IMDB, we analyze the "83% Ambiguity Ceiling" -- a stopping point where this specific discrete architecture consistently plateaus on noisy benchmarks. Unlike continuous models that can surpass this ceiling by memorizing texture or statistical noise, the discrete model refuses to commit to ambiguous samples. We show that this refusal is not a failure but a feature: the model stops where structural evidence ends. However, standard training on ambiguous data eventually forces Benign Overfitting, causing a pathological migration from Uncertain-Incorrect (appropriate doubt) to Confident-Incorrect (hallucination). We propose that "good training" for reasoning systems must be defined not by accuracy, but by maximizing the Certainty-Validity Score (CVS) -- ensuring the model knows where to stop.
Paper Structure (45 sections, 2 equations, 1 figure, 12 tables)

This paper contains 45 sections, 2 equations, 1 figure, 12 tables.

Figures (1)

  • Figure 1: Excitability Phase Diagram (MNIST, 30 epochs, 561K params). Each point represents one training epoch, plotted by train--test divergence ($x$-axis: Train Acc $-$ Test Acc) against Certainty-Validity Score ($y$-axis). Marker colour encodes training loss via the Oranges colourmap (darker $=$ higher loss, lighter $=$ lower loss). Directional arrows trace the epoch-to-epoch trajectory. The dashed magenta line marks the median-CVS excitability threshold. Three labelled regions: Structural Discovery (green; E1, divergence $= -10.54$, the Platonic Spike where test accuracy exceeds training by $>$10 points, CVS $= 0.511$), Optimal State (blue; E2--E4, divergence near zero, peak CVS $= 0.571$), and Benign Overfitting (orange; E10--E30, divergence compressed near zero, CVS cascading from $0.29$ to the floor at E28: CVS $= 0.177$). Full training logs and checkpoints are available in the supplementary repository anderson2026invariant.