Table of Contents
Fetching ...

Nested replicator dynamics, nested logit choice, and similarity-based learning

Panayotis Mertikopoulos, William H. Sandholm

TL;DR

The paper studies learning and evolutionary dynamics in population games where action sets carry a partition-based similarity structure. It introduces the nested replicator dynamics (NRD), a bias toward imitating similar strategies, and shows that NRD, despite not satisfying standard monotonicity, preserves core long-run rationality properties such as extinction of dominated strategies and convergence to Nash equilibria in several game classes. A key contribution is linking NRD to stimulus–response learning via nested logit (NLC) and to regularized learning frameworks (FTRL), establishing a triple equivalence between revision, reinforcement, and regularized learning perspectives. The analysis reveals how similarity structures alter convergence rates and extinction speeds, and provides a unified interpretation of NRD through nested KL divergences and nested entropy penalties. The results offer both theoretical insight into similarity-based learning and practical connections to online learning algorithms and regularized optimization in games.

Abstract

We consider a model of learning and evolution in games whose action sets are endowed with a partition-based similarity structure intended to capture exogenous similarities between strategies. In this model, revising agents have a higher probability of comparing their current strategy with other strategies that they deem similar, and they switch to the observed strategy with probability proportional to its payoff excess. Because of this implicit bias toward similar strategies, the resulting dynamics - which we call the nested replicator dynamics - do not satisfy any of the standard monotonicity postulates for imitative game dynamics; nonetheless, we show that they retain the main long-run rationality properties of the replicator dynamics, albeit at quantitatively different rates. We also show that the induced dynamics can be viewed as a stimulus-response model in the spirit of Erev & Roth (1998), with choice probabilities given by the nested logit choice rule of Ben-Akiva (1973) and McFadden (1978). This result generalizes an existing relation between the replicator dynamics and the exponential weights algorithm in online learning, and provides an additional layer of interpretation to our analysis and results.

Nested replicator dynamics, nested logit choice, and similarity-based learning

TL;DR

The paper studies learning and evolutionary dynamics in population games where action sets carry a partition-based similarity structure. It introduces the nested replicator dynamics (NRD), a bias toward imitating similar strategies, and shows that NRD, despite not satisfying standard monotonicity, preserves core long-run rationality properties such as extinction of dominated strategies and convergence to Nash equilibria in several game classes. A key contribution is linking NRD to stimulus–response learning via nested logit (NLC) and to regularized learning frameworks (FTRL), establishing a triple equivalence between revision, reinforcement, and regularized learning perspectives. The analysis reveals how similarity structures alter convergence rates and extinction speeds, and provides a unified interpretation of NRD through nested KL divergences and nested entropy penalties. The results offer both theoretical insight into similarity-based learning and practical connections to online learning algorithms and regularized optimization in games.

Abstract

We consider a model of learning and evolution in games whose action sets are endowed with a partition-based similarity structure intended to capture exogenous similarities between strategies. In this model, revising agents have a higher probability of comparing their current strategy with other strategies that they deem similar, and they switch to the observed strategy with probability proportional to its payoff excess. Because of this implicit bias toward similar strategies, the resulting dynamics - which we call the nested replicator dynamics - do not satisfy any of the standard monotonicity postulates for imitative game dynamics; nonetheless, we show that they retain the main long-run rationality properties of the replicator dynamics, albeit at quantitatively different rates. We also show that the induced dynamics can be viewed as a stimulus-response model in the spirit of Erev & Roth (1998), with choice probabilities given by the nested logit choice rule of Ben-Akiva (1973) and McFadden (1978). This result generalizes an existing relation between the replicator dynamics and the exponential weights algorithm in online learning, and provides an additional layer of interpretation to our analysis and results.
Paper Structure (30 sections, 11 theorems, 110 equations, 5 figures)

This paper contains 30 sections, 11 theorems, 110 equations, 5 figures.

Key Result

Theorem 1

Let $\mathcal{G}\equiv\mathcal{G}(\mathcal{A},v)$ be a population game, let $\mathcal{P}_{\!\bullet}$ be a similarity structure on $\mathcal{A}$, and let $x_{}(t)$ be an interior solution of the NRD eq:NRD for $\mathcal{G}$. Then:

Figures (5)

  • Figure 1: Grouping of alternative modes of transport by similarity.
  • Figure 2: Graphical representation of a $3$-tier similarity structure as a rooted tree.
  • Figure 3: Solution orbits of the RD (left) and the NRD (right) in a game of "good Rock-Paper-Scissors" (top) and the commuting game described in the text (bottom). NE are depicted in red, stationary points in blue; the contours represent the population mean payoff (higher values shifted to red). In the RPS game, "P" has been arbitrarily grouped with "S" to illustrate the distortion incurred by the implicit similarity bias of \ref{['eq:NRD']}. In all cases, \ref{['eq:NRD']} has been run with intra-level sampling probabilities $\lambda_{0}=1/4$ and $\lambda_{1} = 3/4$. The evolution of the population shares of the highlighted orbit in the commuting game is shown in \ref{['fig:domrate']}.
  • Figure 4: The rate of extinction of dominated strategies over time under the RD (left) and the NRD (right). Population shares are computed for the highlighted orbits of \ref{['fig:portraits']}; axes are log-linear, indicating an exponential rate of convergence to equilibrium, and an exponential rate of extinction of dominated strategies (with the slope of each line capturing the extinction exponent). In tune with \ref{['prop:domrate']}, we observe that the first bus line becomes extinct at a significantly faster rate when similarities are not taken into account.
  • Figure 5: The triple equivalence between \ref{['eq:NRD']}, \ref{['eq:NEW']}, and \ref{['eq:NRL']}.

Theorems & Definitions (34)

  • Example 1: Symmetric random matching
  • Definition 1
  • Example : PPI
  • Definition 2
  • Example
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • ...and 24 more