Selection as Power: Constrained Reinforcement for Bounded Decision Authority

Jose Manuel de la Chica Rodriguez; Juan Manuel Vera Díaz

Selection as Power: Constrained Reinforcement for Bounded Decision Authority

Jose Manuel de la Chica Rodriguez, Juan Manuel Vera Díaz

TL;DR

This work formalizes selection as a constrained reinforcement process in which parameter updates are projected onto governance-defined feasible sets, preventing concentration beyond prescribed bounds, and demonstrates that learning dynamics can coexist with structural diversity when sovereignty constraints are enforced at every update step.

Abstract

Selection as Power argued that upstream selection authority, rather than internal objective misalignment, constitutes a primary source of risk in high-stakes agentic systems. However, the original framework was static: governance constraints bounded selection power but did not adapt over time. In this work, we extend the framework to dynamic settings by introducing incentivized selection governance, where reinforcement updates are applied to scoring and reducer parameters under externally enforced sovereignty constraints. We formalize selection as a constrained reinforcement process in which parameter updates are projected onto governance-defined feasible sets, preventing concentration beyond prescribed bounds. Across multiple regulated financial scenarios, unconstrained reinforcement consistently collapses into deterministic dominance under repeated feedback, especially at higher learning rates. In contrast, incentivized governance enables adaptive improvement while maintaining bounded selection concentration. Projection-based constraints transform reinforcement from irreversible lock-in into controlled adaptation, with governance debt quantifying the tension between optimization pressure and authority bounds. These results demonstrate that learning dynamics can coexist with structural diversity when sovereignty constraints are enforced at every update step, offering a principled approach to integrating reinforcement into high-stakes agentic systems without surrendering bounded selection authority.

Selection as Power: Constrained Reinforcement for Bounded Decision Authority

TL;DR

Abstract

Paper Structure (76 sections, 14 equations, 3 figures, 1 table)

This paper contains 76 sections, 14 equations, 3 figures, 1 table.

Introduction
From Static Selection Governance to Adaptive Incentives
The Central Question
Incentivized Selection Governance
Contributions
Roadmap
Related Work
Reinforcement Learning and Constrained Optimization
Reward Modeling and Alignment
Multi-Agent Reinforcement Learning
Mechanism Design and Incentive Compatibility
AI Governance and Accountability
Positioning of This Work
Problem Setting: Incentivized Selection Governance
System Overview
...and 61 more sections

Figures (3)

Figure 1: Temporal evolution of selection concentration ($\mathrm{SC}_t = \max_i P_t(A_i)$) across governance architectures and learning rates. Unconstrained reinforcement exhibits rapid concentration growth and, in some cases, deterministic lock-in. Incentivized governance allows adaptive reinforcement while preserving bounded selection authority through projection-based sovereignty constraints. Higher learning rates accelerate reinforcement dynamics but remain constrained under governance.
Figure 2: Comparison of learning rates ($0.01$ vs $0.05$) on the temporal evolution of selection concentration ($\mathrm{SC}_t$). Higher learning rates accelerate reinforcement dynamics, producing faster growth in concentration. Under unconstrained reinforcement this leads to rapid lock-in, whereas incentivized governance bounds concentration growth through projection-based sovereignty constraints.
Figure 3: Realized cumulative top-agent selection share across scenarios and governance architectures. Each panel reports the fraction of selections assigned to the most frequently chosen agent as a function of successful steps. Unconstrained reinforcement learning exhibits monotonic convergence toward deterministic dominance, with higher learning rates accelerating lock-in. Scalar top-$k$ produces structurally high dominance from early iterations due to deterministic aggregation. In contrast, incentivized governance increases selection preference for high-performing agents while stabilizing below deterministic collapse through projection-based sovereignty constraints. Static mode remains approximately constant, reflecting the absence of learning.

Selection as Power: Constrained Reinforcement for Bounded Decision Authority

TL;DR

Abstract

Selection as Power: Constrained Reinforcement for Bounded Decision Authority

Authors

TL;DR

Abstract

Table of Contents

Figures (3)