Table of Contents
Fetching ...

Policy-Controlled Generalized Share: A General Framework with a Transformer Instantiation for Strictly Online Switching-Oracle Tracking

Hongkai Hu

Abstract

Static regret to a single expert is often the wrong target for strictly online prediction under non-stationarity, where the best expert may switch repeatedly over time. We study Policy-Controlled Generalized Share (PCGS), a general strictly online framework in which the generalized-share recursion is fixed while the post-loss update controls are allowed to vary adaptively. Its principal instantiation in this paper is PCGS-TF, which uses a causal Transformer as an update controller: after round t finishes and the loss vector is observed, the Transformer outputs the controls that map w_t to w_{t+1} without altering the already committed decision w_t. Under admissible post-loss update controls, we obtain a pathwise weighted regret guarantee for general time-varying learning rates, and a standard dynamic-regret guarantee against any expert path with at most S switches under the constant-learning-rate specialization. Empirically, on a controlled synthetic suite with exact dynamic-programming switching-oracle evaluation, PCGS-TF attains the lowest mean dynamic regret in all seven non-stationary families, with its advantage increasing for larger expert pools. On a reproduced household-electricity benchmark, PCGS-TF also achieves the lowest normalized dynamic regret for S = 5, 10, and 20.

Policy-Controlled Generalized Share: A General Framework with a Transformer Instantiation for Strictly Online Switching-Oracle Tracking

Abstract

Static regret to a single expert is often the wrong target for strictly online prediction under non-stationarity, where the best expert may switch repeatedly over time. We study Policy-Controlled Generalized Share (PCGS), a general strictly online framework in which the generalized-share recursion is fixed while the post-loss update controls are allowed to vary adaptively. Its principal instantiation in this paper is PCGS-TF, which uses a causal Transformer as an update controller: after round t finishes and the loss vector is observed, the Transformer outputs the controls that map w_t to w_{t+1} without altering the already committed decision w_t. Under admissible post-loss update controls, we obtain a pathwise weighted regret guarantee for general time-varying learning rates, and a standard dynamic-regret guarantee against any expert path with at most S switches under the constant-learning-rate specialization. Empirically, on a controlled synthetic suite with exact dynamic-programming switching-oracle evaluation, PCGS-TF attains the lowest mean dynamic regret in all seven non-stationary families, with its advantage increasing for larger expert pools. On a reproduced household-electricity benchmark, PCGS-TF also achieves the lowest normalized dynamic regret for S = 5, 10, and 20.

Paper Structure

This paper contains 103 sections, 17 theorems, 294 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Assume $\tilde{\ell}_{t,k}\in[0,1]$ for all $t\in[T]$ and $k\in[K]$. Let $w_1\in\Delta_K$ be $\mathcal{F}_0$-measurable, and run PCGS with admissible controls $\{(\eta_t,\rho_t,q_t)\}_{t=1}^{T-1}$ as defined in Section sec:admissible_controls. Define Then the played weights $w_t$ are $\mathcal{F}_{t-1}$-measurable for every $t$, and for every realized loss sequence and every comparator path $\pi_

Figures (6)

  • Figure 1: Switching complexity becomes learnable. Under PCGS, switching to expert $\pi_{t+1}$ incurs $-\log(\rho_t q_t(\pi_{t+1}))$, replacing the classical uniform-restart cost $\log K$ by a data-dependent code length $-\log q_t(\pi_{t+1})$.
  • Figure 2: Main suite summary (mean$\pm$std dynamic regret). PCGS improves over FixedShare and GenShare(heur) across all families.
  • Figure 3: Mechanism evidence connecting theory to behavior. In regimes with abrupt changes, $\rho_t$ increases, enabling rapid reallocation; in predictive regimes, $q_t$ concentrates mass, reducing the effective switching complexity.
  • Figure 4: Heavy-tail robustness grid: mean improvement (GenShare DynRegret $-$ Ours DynRegret). Positive values indicate PCGS is better.
  • Figure 5: Scaling with $K\in\{64,256,1024\}$. The advantage of policy-controlled restarts increases with larger expert libraries.
  • ...and 1 more figures

Theorems & Definitions (35)

  • Theorem 1: Weighted pathwise regret for PCGS under admissible strictly causal update controls
  • Corollary 1: Dynamic regret against the exact $S$-switch oracle
  • proof
  • Corollary 2: Fixed Share as a special case
  • Lemma 1: Cross-entropy training upper bounds the empirical switching-complexity term
  • Theorem 2: Pathwise regret bound controlled by policy cross-entropies
  • Lemma 2: PCGS is strictly online by construction
  • proof
  • Lemma 3: Correctness of the DP recurrence
  • proof
  • ...and 25 more