Table of Contents
Fetching ...

Personality Requires Struggle: Three Regimes of the Baldwin Effect in Neuroevolved Chess Agents

Diego Armando Resendez Prado

Abstract

Can lifetime learning expand behavioral diversity over evolutionary time, rather than collapsing it? Prior theory predicts that plasticity reduces variance by buffering organisms against environmental noise. We test this in a competitive domain: chess agents with eight NEAT-evolved neural modules, Hebbian within-game plasticity, and a desirability-domain signal chain with imagination. Across 10~seeds per Hebbian condition, a variance crossover emerges: Hebbian ON starts with lower cross-seed variance than OFF, then surpasses it at generation~34. The crossover trend is monotonic (\r{ho} = 0.91, p < 10^{-6): plasticity's effect on behavioral variance reverses over evolutionary time, initially compressing diversity (consistent with prior predictions) then expanding it as evolved Perception differences are amplified through imagination -- a feedback loop that mutation alone cannot sustain. The result is structured behavioral divergence: evolved agents select different moves on the same positions (62\% disagreement), develop distinct opening repertoires, piece preferences, and game lengths. These are not different sampling policies -- they are reproducible behavioral signatures (ICC > 0.8) with interpretable signal chain configurations. Three regimes appear depending on opponent type: exploration (Hebbian ON, heterogeneous opponent), lottery (Hebbian OFF, elitism lock-in), and transparent (same-model opponent, brain self-erasure). The transparent regime generates a falsifiable prediction: self-play systems may systematically suppress behavioral diversity by eliminating the heterogeneity that personality requires. \textbf{Keywords: Baldwin Effect, neuroevolution, NEAT, Hebbian learning, chess, cognitive architecture, personality emergence, imagination

Personality Requires Struggle: Three Regimes of the Baldwin Effect in Neuroevolved Chess Agents

Abstract

Can lifetime learning expand behavioral diversity over evolutionary time, rather than collapsing it? Prior theory predicts that plasticity reduces variance by buffering organisms against environmental noise. We test this in a competitive domain: chess agents with eight NEAT-evolved neural modules, Hebbian within-game plasticity, and a desirability-domain signal chain with imagination. Across 10~seeds per Hebbian condition, a variance crossover emerges: Hebbian ON starts with lower cross-seed variance than OFF, then surpasses it at generation~34. The crossover trend is monotonic (\r{ho} = 0.91, p < 10^{-6): plasticity's effect on behavioral variance reverses over evolutionary time, initially compressing diversity (consistent with prior predictions) then expanding it as evolved Perception differences are amplified through imagination -- a feedback loop that mutation alone cannot sustain. The result is structured behavioral divergence: evolved agents select different moves on the same positions (62\% disagreement), develop distinct opening repertoires, piece preferences, and game lengths. These are not different sampling policies -- they are reproducible behavioral signatures (ICC > 0.8) with interpretable signal chain configurations. Three regimes appear depending on opponent type: exploration (Hebbian ON, heterogeneous opponent), lottery (Hebbian OFF, elitism lock-in), and transparent (same-model opponent, brain self-erasure). The transparent regime generates a falsifiable prediction: self-play systems may systematically suppress behavioral diversity by eliminating the heterogeneity that personality requires. \textbf{Keywords: Baldwin Effect, neuroevolution, NEAT, Hebbian learning, chess, cognitive architecture, personality emergence, imagination

Paper Structure

This paper contains 46 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Pool architecture. Phase 1: five modules read independently from a shared signal pool (board sensors, game context, cartridge WDL, distribution shape). Phase 2: Personality reads all Phase 1 outputs. Phase 3: Integration produces signal chain parameters. The signal chain reshapes the cartridge's move distribution; imagination evaluates top candidates through 1-ply lookahead using Perception (dashed). Each module is a NEAT network.
  • Figure 2: Neuroscience inspiration for the AILED-Brain architecture. Left: human brain regions. Right: corresponding AILED-Brain modules. Dashed gray arrows show the functional mapping; color coding indicates analogous roles. Both systems feature parallel sensory processing, preference integration, and action selection. The key parallel: imagination reuses Perception (blue dashed) the same way mental imagery reuses visual cortex. Bottom: both operate on two timescales---within-lifetime plasticity and cross-generation evolution.
  • Figure 3: Hebbian ON vs OFF trajectories (mean across 10 seeds each) with AILED Engine v26.3.0 against Maia2 at 1100 Elo. Left: mean best fitness. OFF rises faster but plateaus by gen 20; ON climbs steadily through gen 50. Right: mean agreement. Both approach $\sim$40--42%, but ON's late-stage slope is $7.6\times$ larger than OFF's. The difference is in dynamics, not endpoint.
  • Figure 4: Agreement trajectories across the three regimes. (a) Expressive cartridge (AILED Engine v26.3.0 vs Maia2-1100, mean of 10 seeds per condition): both conditions approach $\sim$40% agreement, but ON is still climbing while OFF has flatlined. The difference is in dynamics, not endpoint. (b) Dominant cartridge (Maia2 1400 vs 1100, first 25 of 100 generations shown): agreement flat at 50% throughout---the brain is passive. (c) Mirror matches: agreement converges to 100% (self-erasure). Maia2-1600 reaches transparency at gen 7, faster than 1400 at gen 13.
  • Figure 5: Parameter evolution of the diplomat (seed 2) and fighter (seed 9) across generations 10--50. Top: saturation ceiling diverges from similar starting values to opposite extremes (0.91 vs 0.22). Bottom: temperature modifier diverges from near-zero to opposite signs ($+0.26$ vs $-0.44$). Divergence is learned progressively, not present at initialization.