Table of Contents
Fetching ...

Learning to generalize in evolution through annealed population heterogeneity

Federica Ferretti, Mehran Kardar, Arvind Murugan

TL;DR

It is demonstrated that annealed population heterogeneity, wherein distinct individuals in the population experience different instances of a complex environment over time, can act as a form of implicit regularization and facilitate evolutionary generalization.

Abstract

Evolutionary systems must learn to generalize, often extrapolating from a limited set of selective conditions to anticipate future environmental changes. The mechanisms enabling such generalization remain poorly understood, despite their importance to predict ecological robustness, drug resistance, or design future-proof vaccination strategies. Here, we demonstrate that annealed population heterogeneity, wherein distinct individuals in the population experience different instances of a complex environment over time, can act as a form of implicit regularization and facilitate evolutionary generalization. Mathematically, annealed heterogeneity introduces a variance-weighted demographic noise term that penalizes across-environment fitness variance and effectively rescales the population size, thereby biasing evolution toward generalist solutions. This process is indeed analogous to a variant of the mini-batching strategy employed in stochastic gradient descent, where an effective multiplicative noise produces an inductive bias by triggering noise-induced transitions. Through numerical simulations and theoretical analysis we discuss the conditions under which variation in how individuals experience environmental selection can naturally promote evolutionary strategies that generalize across environments and anticipate novel challenges.

Learning to generalize in evolution through annealed population heterogeneity

TL;DR

It is demonstrated that annealed population heterogeneity, wherein distinct individuals in the population experience different instances of a complex environment over time, can act as a form of implicit regularization and facilitate evolutionary generalization.

Abstract

Evolutionary systems must learn to generalize, often extrapolating from a limited set of selective conditions to anticipate future environmental changes. The mechanisms enabling such generalization remain poorly understood, despite their importance to predict ecological robustness, drug resistance, or design future-proof vaccination strategies. Here, we demonstrate that annealed population heterogeneity, wherein distinct individuals in the population experience different instances of a complex environment over time, can act as a form of implicit regularization and facilitate evolutionary generalization. Mathematically, annealed heterogeneity introduces a variance-weighted demographic noise term that penalizes across-environment fitness variance and effectively rescales the population size, thereby biasing evolution toward generalist solutions. This process is indeed analogous to a variant of the mini-batching strategy employed in stochastic gradient descent, where an effective multiplicative noise produces an inductive bias by triggering noise-induced transitions. Through numerical simulations and theoretical analysis we discuss the conditions under which variation in how individuals experience environmental selection can naturally promote evolutionary strategies that generalize across environments and anticipate novel challenges.

Paper Structure

This paper contains 3 sections, 39 equations, 16 figures.

Figures (16)

  • Figure 1: Schematic of the evolutionary mini-batch analogy and modeling framework.A Biological inspiration. (left) Individuals in the population (e.g., antibodies, shown as Ys) encounter distinct microenvironments (e.g., different concentrations of distinct antigens) so that the fitness of every individual is continually reshuffled across generations through exposure to a variety of environmental challenges. In contrast, the fitness landscape is static (right) if the population is evolve din a single homogeneous setting. B Fitness landscape model. This source of heterogeneity is represented as an ensemble of fitness landscapes; in each generation every individual samples one landscape at random, analogous to a mini-batch of training data in stochastic gradient descent. C Evolutionary dynamics. In our model, a single generation consists of four steps: (1) microenvironment sampling by individuals, (2) Wright–Fisher reproduction and selection on the sampled fitnesses, (3) resetting of phenotypes so that microenvironmental states do not persist, and (4) mutation. Together these steps capture how annealed population heterogeneity introduces an additional source of demographic noise that biases evolution toward genotypes with robust, across-environment performance.
  • Figure 2: Population response to mini-batching in a toy model.A Landscapes. We generate an ensemble of $M=100$ fitness landscapes with the described probabilistic model, where $q=0.2$, $H=4$, $h^0=1$. At each generation, every individual in the population picks one of these 100 fitness landscapes at random. Genotypic sequences have length $L=8$; selection fields are kept fixed on $K=4$ sites; 15 distinct landscapes were generated (present in the training dataset with different frequency). B Statistics of the training set. Mean and variance of the selection fields across the training dataset. C Sequence space representation. The dynamics of the system is governed by the mean ($f_a$) and the variance ($V_a$) of the fitness values of each genotype across the training dataset. The numerical model we considered exhibits a constitutive
  • Figure 3: inverse trade-off between these two quantities, as shown by the front on the right end of the plot. In the absence of this trade-off, the fittest sequences would already be the best generalizers, making single-cell variability unnecessary to improve the generalization performance of the population. In the figure, edges connect genotypes that differ for a single mutation. D Comparison of population dynamics models. We numerically estimate the joint probability densities of two collective observables, i.e. the population mean fitness $\bar{f}(z) = \sum f_a z_a$ and the population fitness variance $\bar{V}(z) = \sum_a V_a z_a$, both for the process with individual mini-batching and for the reference process where the population evolves in the fixed, average landscape. The density is unimodal in the reference case, with a single peak on the top right corner of the simplex, while it is bimodal in the case with individual mini-batching, with a second peak in a lower position on the trade-off front. Population size: $N=128$. E Distributions at mesoscopic population sizes. As a function of $N$, the number and location of the modes of the distribution of the collective variable $\bar{g}(\boldsymbol z)$ change. At mesoscopic $N$, the difference between the processes with and without mini-batching is most evident. F Generalization. We attribute to each sequence a generalization score $g_a$, bounded between 0 and 1, as defined in the main text. The expected population composition by generalization score is significantly different between the reference case (right, dominated by specialists) and the case with mini-batching (left, enriched in generalists). G Mutational robustness. In the presence of mini-batching, the (average) population composition is enriched with "flatter" species, characterized by lower values of the average mutational effects $\gamma_a = \sum_{b} |\Lambda_{ab}(f_b - f_a)|$. H Magnitude of mini-batching effect. We plot the relative changes $2\lvert\langle A\rangle - \langle A\rangle_0\rvert/(\langle A\rangle + \langle A\rangle_0)$ for $A=\bar{f}(\boldsymbol z)$ and $A=\bar{g}(\boldsymbol z)$. Here $\langle\cdot\rangle$ and $\langle\cdot\rangle_0$ indicate respectively the average over the steady-state distribution of the process with mini-batching and the reference process. The dashed black line corresponds to $1/N$ (expected order of magnitude of the mini-batching effect). A big deviation from this reference is observed at intermediate population sizes.
  • Figure 4: Price equation reveals a non-equilibrium trap shaped by demographic noise.A Nullclines and noise amplitude of the stochastic Price equation. We compute the drift and diffusion terms of the generalized Price equation in \ref{['Price-f']}--\ref{['Price-V']} across the population states visited at stationarity. The color schemes show the deterministic forces (left) or the temperature of the fluctuations (right) along the $\bar{f}$ (top) and $\bar{V}$ (bottom) axis. The system exhibits multiple points with near-zero drift, and a strongly inhomogenous temperature profile across them. The temperature gradient in the system with mini-batching is an order of magnitude larger than the temperature gradient in the absence of mini-batching (insets). B Quasi-equilibrium states. There are two major regions (in PCA representation) where the magnitude of the drift term is near zero. The population may spend long time in these regions, even if not stable fixed points of the dynamics, especially in the presence of noise. C Dynamics along PC1. The projection of the dynamics along the first PC of the ensemble of non-equilibrium steady states at $N=128$ is described by the Price equation in Eq. 23 of the SI ---with $\boldsymbol u$ the eigenvector associated to PC1, such that $x_1=\boldsymbol u \cdot \boldsymbol z=\bar{u}(\boldsymbol z)$. The system behaves as a quasi-1D dynamical system, as testified by
  • Figure 5: the low scatter of the phase portrait in the left panel, both in its deterministic ($y$ coordinate) and stochastic (color) components. On the right, an effective 1D phase portrait is built by binning the PC1 coordinate and averaging over population states in the same bin. The color-changing line is an illustration of the effective pseudo-potential obtained by numerically integrating $\langle\partial_t x_1\rangle^{CG}$. The dashed lines indicate drift and diffusion of the reference process (where mini-batching is off) along PC1. We observe that there are two quasi-equilibrium points, where $\langle\partial_t x_1\rangle\approx0$. The stable fixed point is however at a much larger temperature than the marginally stable one: this temperature gradient is strong enough to trap the system at the marginally stable fixed point.
  • ...and 11 more figures