Table of Contents
Fetching ...
Paper

Stable but Miscalibrated: A Kantian View on Overconfidence from Filters to Large Language Models

Abstract

We reinterpret Kant's Critique of Pure Reason as a theory of feedback stability, viewing reason as a regulator that keeps inference within the bounds of possible experience. We formalize this intuition in linear-Gaussian state-space models via H-Risk, a composite instability index integrating spectral margin, conditioning, temporal sensitivity, and innovation amplification. In simulations, higher H-Risk predicts overconfident errors and degraded closed-loop behavior even when the dynamics remain formally stable, exposing a gap between nominal and epistemic stability. Extending this stability lens to large language models (LLMs), we introduce a domain-wise proxy based on confidence fluctuations and overconfident errors. In a binary-question study, a Kantian-inspired policy that permits ''cannot judge'' responses yields targeted reductions in policy-aware squared loss in high-stakes domains relative to an overconfident baseline. To probe internal dynamics, we analyse layer-wise sensitivity of hidden states to small input perturbations. Contrary to a naive instability hypothesis, confidently wrong answers show no instability gap; instead, they are at least as locally stable as confidently correct answers, revealing stable miscalibration in which hallucinations behave like robust but misaligned attractors. For Qwen-2.5, spectral and activation profiles suggest a high signal-to-noise, low effective signal temperature regime in which representations become inertial and resistant to contextual shifts. These results bridge Kantian self-limitation and feedback control, and suggest that stable high-confidence hallucinations may not be readily corrected by output-only heuristics (e.g., temperature scaling or re-sampling), motivating process-level interventions that explicitly perturb and re-evaluate the inference trajectory.