Metagoals Endowing Self-Modifying AGI Systems with Goal Stability or Moderated Goal Evolution: Toward a Formally Sound and Practical Approach
Ben Goertzel
TL;DR
The paper tackles how self-modifying AGI can retain critical invariants in its goal system by embedding metagoals that steer development along contractive or constructive fixed-point dynamics. It frames the problem through fixed-point theorems—Banach, Brouwer, Schauder—and extends them to probabilistic and constructive Schauder variants, including ML-accelerated constructive algorithms and invariant-measure considerations. The authors propose two foundational metagoal families—goal-stability (via contraction) and moderated-goal-evolution (via constructive Schauder-type approaches), and argue for hybrid strategies that combine stability with measured evolution to preserve tractability and self-understanding. While absolute guarantees are elusive, the framework aims to bias system dynamics toward stable, intelligible long-run behavior through principled mathematical constructs and practical search methods. This work lays groundwork for theory-informed experimentation in open-ended AGI systems, suggesting concrete mathematical and algorithmic directions to enhance safety and controllability in self-modifying AI.
Abstract
We articulate here a series of specific metagoals designed to address the challenge of creating AGI systems that possess the ability to flexibly self-modify yet also have the propensity to maintain key invariant properties of their goal systems 1) a series of goal-stability metagoals aimed to guide a system to a condition in which goal-stability is compatible with reasonably flexible self-modification 2) a series of moderated-goal-evolution metagoals aimed to guide a system to a condition in which control of the pace of goal evolution is compatible with reasonably flexible self-modification The formulation of the metagoals is founded on fixed-point theorems from functional analysis, e.g. the Contraction Mapping Theorem and constructive approximations to Schauder's Theorem, applied to probabilistic models of system behavior We present an argument that the balancing of self-modification with maintenance of goal invariants will often have other interesting cognitive side-effects such as a high degree of self understanding Finally we argue for the practical value of a hybrid metagoal combining moderated-goal-evolution with pursuit of goal-stability -- along with potentially other metagoals relating to goal-satisfaction, survival and ongoing development -- in a flexible fashion depending on the situation
