Table of Contents
Fetching ...

What does a system modify when it modifies itself?

Florentin Koch

Abstract

When a cognitive system modifies its own functioning, what exactly does it modify: a low-level rule, a control rule, or the norm that evaluates its own revisions? Cognitive science describes executive control, metacognition, and hierarchical learning with precision, but lacks a formal framework distinguishing these targets of transformation. Contemporary artificial intelligence likewise exhibits self-modification without common criteria for comparison with biological cognition. We show that the question of what counts as a self-modifying system entails a minimal structure: a hierarchy of rules, a fixed core, and a distinction between effective rules, represented rules, and causally accessible rules. Four regimes are identified: (1) action without modification, (2) low-level modification, (3) structural modification, and (4) teleological revision. Each regime is anchored in a cognitive phenomenon and a corresponding artificial system. Applied to humans, the framework yields a central result: a crossing of opacities. Humans have self-representation and causal power concentrated at upper hierarchical levels, while operational levels remain largely opaque. Reflexive artificial systems display the inverse profile: rich representation and causal access at operational levels, but none at the highest evaluative level. This crossed asymmetry provides a structural signature for human-AI comparison. The framework also offers insight into artificial consciousness, with higher-order theories and Attention Schema Theory as special cases. We derive four testable predictions and identify four open problems: the independence of transformativity and autonomy, the viability of self-modification, the teleological lock, and identity under transformation.

What does a system modify when it modifies itself?

Abstract

When a cognitive system modifies its own functioning, what exactly does it modify: a low-level rule, a control rule, or the norm that evaluates its own revisions? Cognitive science describes executive control, metacognition, and hierarchical learning with precision, but lacks a formal framework distinguishing these targets of transformation. Contemporary artificial intelligence likewise exhibits self-modification without common criteria for comparison with biological cognition. We show that the question of what counts as a self-modifying system entails a minimal structure: a hierarchy of rules, a fixed core, and a distinction between effective rules, represented rules, and causally accessible rules. Four regimes are identified: (1) action without modification, (2) low-level modification, (3) structural modification, and (4) teleological revision. Each regime is anchored in a cognitive phenomenon and a corresponding artificial system. Applied to humans, the framework yields a central result: a crossing of opacities. Humans have self-representation and causal power concentrated at upper hierarchical levels, while operational levels remain largely opaque. Reflexive artificial systems display the inverse profile: rich representation and causal access at operational levels, but none at the highest evaluative level. This crossed asymmetry provides a structural signature for human-AI comparison. The framework also offers insight into artificial consciousness, with higher-order theories and Attention Schema Theory as special cases. We derive four testable predictions and identify four open problems: the independence of transformativity and autonomy, the viability of self-modification, the teleological lock, and identity under transformation.

Paper Structure

This paper contains 82 sections, 2 theorems, 1 equation, 1 figure, 3 tables.

Key Result

Proposition 1

The logical constraint established at step 5 (§2.2) translates as follows: in any system in Regime 3, there exists at each time $t$ a level $k_{\max}$ such that $R_{k_{\max}}$ is fixed. This constraint is recognized in the AI safety literature under the name of goal stability (Soares & Fallenstein,

Figures (1)

  • Figure 1: Hierarchical architecture of the functional state $\Phi_t$ and local structure of an operation $R_{i+1} \to R_i$. Left panel: The system $\Phi_t$ is organized as a hierarchy of rules from observable behaviors ($R_0$) to the teleological norm ($R_{k_{\max}}$). Each level corresponds to a self-modification regime (1--4). The dashed red contour indicates $\Phi^{R}_t$ in the human profile ($\Phi^R_{t,\text{human}} \gg \Phi^C_{t,\text{human}}$ at upper levels); the dashed blue contour indicates $\Phi^{R}_t$ in the AI profile ($\Phi^R_{t,\text{AI}} \approx \Phi^C_{t,\text{AI}}$ at lower levels). Environmental feedback $\mathcal{E}_t \rightsquigarrow \Phi_t$ enters the dynamics via $F$. Right panel: Local view of any operation $R_{i+1} \to R_i$, decomposed into a reflexive component ($\Phi^{R}_t \supseteq \{\tilde{R}_{i+1}, \widetilde{R_{i+1} \to R_i}, \tilde{R}_i\} \subseteq \Phi_t$) and a causal component ($\Phi^{C}_t \supseteq \{R_{i+1} \rightsquigarrow R_i\}$), with the hierarchical dynamics equation.

Theorems & Definitions (6)

  • Definition 1: Fixed regime
  • Definition 2: Local regime
  • Definition 3: Structural regime
  • Proposition 1: Causal closure
  • Definition 4: Reflexive regime
  • Proposition 2: Reflexive openness