Table of Contents
Fetching ...

Replicable Online Learning

Saba Ahmadi, Siddharth Bhandari, Avrim Blum

TL;DR

This work investigates the concept of algorithmic replicability and proposes a general framework for converting an online learner into an adversarially replicable one within the authors' setting, bounding the new regret in terms of the original algorithm's regret.

Abstract

We investigate the concept of algorithmic replicability introduced by Impagliazzo et al. 2022, Ghazi et al. 2021, Ahn et al. 2024 in an online setting. In our model, the input sequence received by the online learner is generated from time-varying distributions chosen by an adversary (obliviously). Our objective is to design low-regret online algorithms that, with high probability, produce the exact same sequence of actions when run on two independently sampled input sequences generated as described above. We refer to such algorithms as adversarially replicable. Previous works (such as Esfandiari et al. 2022) explored replicability in the online setting under inputs generated independently from a fixed distribution; we term this notion as iid-replicability. Our model generalizes to capture both adversarial and iid input sequences, as well as their mixtures, which can be modeled by setting certain distributions as point-masses. We demonstrate adversarially replicable online learning algorithms for online linear optimization and the experts problem that achieve sub-linear regret. Additionally, we propose a general framework for converting an online learner into an adversarially replicable one within our setting, bounding the new regret in terms of the original algorithm's regret. We also present a nearly optimal (in terms of regret) iid-replicable online algorithm for the experts problem, highlighting the distinction between the iid and adversarial notions of replicability. Finally, we establish lower bounds on the regret (in terms of the replicability parameter and time) that any replicable online algorithm must incur.

Replicable Online Learning

TL;DR

This work investigates the concept of algorithmic replicability and proposes a general framework for converting an online learner into an adversarially replicable one within the authors' setting, bounding the new regret in terms of the original algorithm's regret.

Abstract

We investigate the concept of algorithmic replicability introduced by Impagliazzo et al. 2022, Ghazi et al. 2021, Ahn et al. 2024 in an online setting. In our model, the input sequence received by the online learner is generated from time-varying distributions chosen by an adversary (obliviously). Our objective is to design low-regret online algorithms that, with high probability, produce the exact same sequence of actions when run on two independently sampled input sequences generated as described above. We refer to such algorithms as adversarially replicable. Previous works (such as Esfandiari et al. 2022) explored replicability in the online setting under inputs generated independently from a fixed distribution; we term this notion as iid-replicability. Our model generalizes to capture both adversarial and iid input sequences, as well as their mixtures, which can be modeled by setting certain distributions as point-masses. We demonstrate adversarially replicable online learning algorithms for online linear optimization and the experts problem that achieve sub-linear regret. Additionally, we propose a general framework for converting an online learner into an adversarially replicable one within our setting, bounding the new regret in terms of the original algorithm's regret. We also present a nearly optimal (in terms of regret) iid-replicable online algorithm for the experts problem, highlighting the distinction between the iid and adversarial notions of replicability. Finally, we establish lower bounds on the regret (in terms of the replicability parameter and time) that any replicable online algorithm must incur.

Paper Structure

This paper contains 37 sections, 26 theorems, 102 equations, 2 figures, 7 algorithms.

Key Result

Theorem 1.1

Let $\rho>0$ be a parameter. For the online linear optimization problem, there exists an adversarially $\rho$-replicable algorithm with sub-linear regret.

Figures (2)

  • Figure 1: An illustration of the $\mathsf{FLLB}(\varepsilon,B)$ algorithm. The perturbed point $c_{1:t-1}+p$ is uniformly random over a cube of side $1/\varepsilon$ with vertex at $c_{1:t-1}$ (similarly for $c'_{t-1}+p$). In \ref{['thm:FLLB-regret-replicability']}, we prove that by McDiarmid concentration bound, two different trajectories $c_{1:t-1}$ and $c'_{1:t-1}$ are within a distance $\Omega(\sqrt{nT})$ with high probability, and they get mapped to the same grid point $g_{t-1}$ with high probability.
  • Figure 2: An illustration of the $\mathsf{FTPLB}$ algorithm. The blue and red boxes are corresponding to $\mathsf{FTPLB}$ and $\mathsf{BTPL}$ respectively.

Theorems & Definitions (47)

  • Remark
  • Theorem 1.1: Informal; see \ref{['thm:FLLB-regret-replicability']}
  • Theorem 1.2: Informal; see \ref{['thm:FTPLS-regret-replicability']}
  • Theorem 1.3: Informal; see \ref{['thm:general-framework-guarantees']}
  • Theorem 1.4: Informal; see \ref{['thm:vanilla_replicability_experts']}
  • Theorem 1.5: Informal; see \ref{['thm:lower_bound_n=2_vanilla_replicability_experts', 'thm:lower_bound_vanilla_replicability_experts']}
  • Example 3.1
  • Theorem 3.1: Regret and Replicability Guarantees of $\FLLB$
  • Remark 3.2
  • Lemma 3.3
  • ...and 37 more