Can a Learner Regret Using a No-Regret Algorithm? A Control-Theoretic Study of Performance Dominance

Hassan Abdelraouf; Jeff S. Shamma

Can a Learner Regret Using a No-Regret Algorithm? A Control-Theoretic Study of Performance Dominance

Hassan Abdelraouf, Jeff S. Shamma

TL;DR

It is shown that the minimal achievable cumulative reward gap is zero, thereby establishing global dominance of anticipatory RD across all payoff environments and establishing a"free lunch" among no-regret learning dynamics.

Abstract

No-regret learning dynamics ensure that a learner asymptotically achieves an average reward no worse than that of any fixed strategy. This no-regret guarantee does not determine the value of the asymptotic average reward. Indeed, it is possible for different no-regret learning dynamics to exhibit different asymptotic average rewards when facing the same environment while both assure the no-regret guarantee. This paper asks whether a "free-lunch" phenomenon can arise among no-regret algorithms. Namely, is it possible for one no-regret learning rule to uniformly outperform another no-regret learning rule across all payoff environments. Stated differently, can a learner regret not using a particular no-regret algorithm? We consider generalized replicator dynamics (RD) as a cascade interconnection between a linear time-invariant (LTI) system and the softmax nonlinearity. Varying this LTI system leads to different realizations of replicator dynamics, including so-called anticipatory RD, exponential RD, and other forms of higher-order RD. Setting the LTI system to be an integrator realizes standard RD, which is known to satisfy the no-regret property. Within this framework, we analyze and compare various realizations of these generalized realizations RD by varying the LTI system. We first formulate performance comparison as a passivity property of an associated comparison system and establish "local" dominance results, i.e., comparing the asymptotic performance near an equilibrium payoff vector. We then cast performance comparison between a form of anticipatory RD and standard RD as an optimal-control problem. We show that the minimal achievable cumulative reward gap is zero, thereby establishing global dominance of anticipatory RD across all payoff environments and establishing a "free lunch" among no-regret learning dynamics.

Can a Learner Regret Using a No-Regret Algorithm? A Control-Theoretic Study of Performance Dominance

TL;DR

Abstract

Paper Structure (28 sections, 12 theorems, 148 equations, 8 figures)

This paper contains 28 sections, 12 theorems, 148 equations, 8 figures.

Introduction
Preliminaries
Notations
Online Learning
Regret
Passivity
Input--output operators
State--space representation
Properties of the Softmax Mapping
Motivation
Payoff-Based Higher-Order replicator dynamics
Oracle Replicator Dynamics (Oracle RD)
Frequency--Domain Intuition (Bode View)
Dominance over Replicator Dynamics
Phase-Lag Dominance at Fixed Gain
...and 13 more sections

Key Result

Lemma 1

For any $v\in\mathbb{R}^n$, with equality if and only if $v=c\,\mathbf{1}_n$ for some $c\in\mathbb{R}$.

Figures (8)

Figure 1: Block–diagram representation of replicator dynamics.
Figure 2: Block–diagram representation of exponential replicator dynamics (Ex–RD).
Figure 3: Performance of different learning dynamics in the environment $p(t)=\sin t0.5^\top$.
Figure 4: Performance of different learning dynamics in the environment $p(t)=\sin t-\sin t^\top$.
Figure 5: Block diagram representation for the anticipatory RD.
...and 3 more figures

Theorems & Definitions (30)

Remark 1
Lemma 1
proof
Lemma 2
proof
Example 1: Finite–regret dominance abdelraouf2025passivity
Example 2
Definition 1: Uniform Dominance
Definition 2: Asymptotic Dominance
Remark 2
...and 20 more

Can a Learner Regret Using a No-Regret Algorithm? A Control-Theoretic Study of Performance Dominance

TL;DR

Abstract

Can a Learner Regret Using a No-Regret Algorithm? A Control-Theoretic Study of Performance Dominance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (30)