SGM: A Statistical Godel Machine for Risk-Controlled Recursive Self-Modification
Xuening Wu, Shenqin Yin, Yanlan Kang, Xinhang Zhang, Qianya Xu, Zeping Chen, Wenqiang Zhang
TL;DR
The paper proposes the Statistical Gödel Machine (SGM), a practical safety layer for recursive self-modification in ML that replaces formal proofs with statistical certificates to guarantee improvement with controllable risk. It introduces Confirm-Triggered Harmonic Spending (CTHS) to concentrate the global error budget on promising confirmations, ensuring familywise safety across an open-ended sequence of edits. The framework supports multiple testing regimes (Hoeffding, empirical Bernstein, and e-values) to provide per-edit and cumulative guarantees and is validated across supervised learning, RL, and black-box optimization, yielding certified gains on CIFAR-100 while rejecting spurious improvements on ImageNet-100 and demonstrating robust behavior in RL and optimization tasks. This approach bridges Gödel-machine safety with scalable, data-driven validation, offering a domain-agnostic foundation for safe continual self-improvement in high-stakes systems. The results highlight SGM’s potential as reusable safety infrastructure for AutoML, NAS, and self-modifying ML pipelines, with clear directions for scaling and extending the guarantees in future work.
Abstract
Recursive self-modification is increasingly central in AutoML, neural architecture search, and adaptive optimization, yet no existing framework ensures that such changes are made safely. Godel machines offer a principled safeguard by requiring formal proofs of improvement before rewriting code; however, such proofs are unattainable in stochastic, high-dimensional settings. We introduce the Statistical Godel Machine (SGM), the first statistical safety layer for recursive edits. SGM replaces proof-based requirements with statistical confidence tests (e-values, Hoeffding bounds), admitting a modification only when superiority is certified at a chosen confidence level, while allocating a global error budget to bound cumulative risk across rounds.We also propose Confirm-Triggered Harmonic Spending (CTHS), which indexes spending by confirmation events rather than rounds, concentrating the error budget on promising edits while preserving familywise validity.Experiments across supervised learning, reinforcement learning, and black-box optimization validate this role: SGM certifies genuine gains on CIFAR-100, rejects spurious improvement on ImageNet-100, and demonstrates robustness on RL and optimization benchmarks.Together, these results position SGM as foundational infrastructure for continual, risk-aware self-modification in learning systems.Code is available at: https://github.com/gravitywavelet/sgm-anon.
