Nested replicator dynamics, nested logit choice, and similarity-based learning
Panayotis Mertikopoulos, William H. Sandholm
TL;DR
The paper studies learning and evolutionary dynamics in population games where action sets carry a partition-based similarity structure. It introduces the nested replicator dynamics (NRD), a bias toward imitating similar strategies, and shows that NRD, despite not satisfying standard monotonicity, preserves core long-run rationality properties such as extinction of dominated strategies and convergence to Nash equilibria in several game classes. A key contribution is linking NRD to stimulus–response learning via nested logit (NLC) and to regularized learning frameworks (FTRL), establishing a triple equivalence between revision, reinforcement, and regularized learning perspectives. The analysis reveals how similarity structures alter convergence rates and extinction speeds, and provides a unified interpretation of NRD through nested KL divergences and nested entropy penalties. The results offer both theoretical insight into similarity-based learning and practical connections to online learning algorithms and regularized optimization in games.
Abstract
We consider a model of learning and evolution in games whose action sets are endowed with a partition-based similarity structure intended to capture exogenous similarities between strategies. In this model, revising agents have a higher probability of comparing their current strategy with other strategies that they deem similar, and they switch to the observed strategy with probability proportional to its payoff excess. Because of this implicit bias toward similar strategies, the resulting dynamics - which we call the nested replicator dynamics - do not satisfy any of the standard monotonicity postulates for imitative game dynamics; nonetheless, we show that they retain the main long-run rationality properties of the replicator dynamics, albeit at quantitatively different rates. We also show that the induced dynamics can be viewed as a stimulus-response model in the spirit of Erev & Roth (1998), with choice probabilities given by the nested logit choice rule of Ben-Akiva (1973) and McFadden (1978). This result generalizes an existing relation between the replicator dynamics and the exponential weights algorithm in online learning, and provides an additional layer of interpretation to our analysis and results.
