Table of Contents
Fetching ...

Online Convex Optimization with Memory and Limited Predictions

Zhengmiao Wang, Zhi-Wei Liu, Ming Chi, Xiaoling Wang, Housheng Su, Lintao Ye

TL;DR

This work tackles online convex optimization where costs depend on past decisions (memory) and future cost predictions are available only within a limited window (bandit-like access). It introduces a predictive, zeroth-order algorithm built from two complementary subroutines: a memory-enabled bandit OCO component with a dynamic regret of $\sqrt{TV_T}$ for the initialization phase and a zeroth-order optimizer with linear convergence for smooth, strongly convex problems, achieved via a novel truncated Gaussian smoothing technique. The main theoretical results show that the overall dynamic regret decays exponentially with the prediction window length $W$ (via $K=\lfloor W/(h-1)\rfloor$), while remaining robust to prediction errors and measurement noise. Empirical results on a quadratic, memory-bearing setting corroborate the theory, demonstrating sublinear initialization regret and rapid exponential improvement as more future information is incorporated, with the truncated Gaussian approach offering superior convergence stability compared to standard Gaussian smoothing.

Abstract

This paper addresses an online convex optimization problem where the cost function at each step depends on a history of past decisions (i.e., memory), and the decision maker has access to limited predictions of future cost values within a finite window. The goal is to design an algorithm that minimizes the dynamic regret against the optimal sequence of decisions in hindsight. To this end, we propose a novel predictive algorithm and establish strong theoretical guarantees for its performance. We show that the algorithm's dynamic regret decays exponentially with the length of the prediction window. Our algorithm comprises two general subroutines of independent interest. The first subroutine solves online convex optimization with memory and bandit feedback, achieving a $\sqrt{TV_T}$-dynamic regret, where $V_T$ measures the variation of the optimal decision sequence. The second is a zeroth-order method that attains a linear convergence rate for general convex optimization, matching the best achievable rate of first-order methods. The key to our algorithm is a novel truncated Gaussian smoothing technique when querying the decision points to obtain the predictions. We validate our theoretical results with numerical experiments.

Online Convex Optimization with Memory and Limited Predictions

TL;DR

This work tackles online convex optimization where costs depend on past decisions (memory) and future cost predictions are available only within a limited window (bandit-like access). It introduces a predictive, zeroth-order algorithm built from two complementary subroutines: a memory-enabled bandit OCO component with a dynamic regret of for the initialization phase and a zeroth-order optimizer with linear convergence for smooth, strongly convex problems, achieved via a novel truncated Gaussian smoothing technique. The main theoretical results show that the overall dynamic regret decays exponentially with the prediction window length (via ), while remaining robust to prediction errors and measurement noise. Empirical results on a quadratic, memory-bearing setting corroborate the theory, demonstrating sublinear initialization regret and rapid exponential improvement as more future information is incorporated, with the truncated Gaussian approach offering superior convergence stability compared to standard Gaussian smoothing.

Abstract

This paper addresses an online convex optimization problem where the cost function at each step depends on a history of past decisions (i.e., memory), and the decision maker has access to limited predictions of future cost values within a finite window. The goal is to design an algorithm that minimizes the dynamic regret against the optimal sequence of decisions in hindsight. To this end, we propose a novel predictive algorithm and establish strong theoretical guarantees for its performance. We show that the algorithm's dynamic regret decays exponentially with the length of the prediction window. Our algorithm comprises two general subroutines of independent interest. The first subroutine solves online convex optimization with memory and bandit feedback, achieving a -dynamic regret, where measures the variation of the optimal decision sequence. The second is a zeroth-order method that attains a linear convergence rate for general convex optimization, matching the best achievable rate of first-order methods. The key to our algorithm is a novel truncated Gaussian smoothing technique when querying the decision points to obtain the predictions. We validate our theoretical results with numerical experiments.

Paper Structure

This paper contains 23 sections, 5 theorems, 99 equations, 3 figures.

Key Result

Lemma 1

Suppose Assumption ass:objective functions holds for $f_t(\cdot)$ for all $t\in[T]$. Then, (a) $\hat{f}_t^c(\cdot)$ is $\beta$-smooth and $\mu$-strongly convex; (b) $C_T(\cdot)$ is $\beta h$-smooth, $\mu$-strongly convex and $G\sqrt{Th}$-Lipschitz; and (c) $\hat{C}_T(\cdot)$ is $\beta h$-smooth and

Figures (3)

  • Figure 1:
  • Figure 2: Log of regret for the full algorithm versus the prediction window size $W$, with $T=400$. The linear downward trend for all smoothing methods empirically confirms the theoretical exponential decay of regret, a key result derived from Theorem \ref{['thm:overall convergence']}.
  • Figure 3: Convergence Rate Comparison (T=200)

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Remark 1
  • Remark 2
  • Definition 5
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • ...and 2 more