Table of Contents
Fetching ...

Risk-averse Learning with Non-Stationary Distributions

Siyi Wang, Zifan Wang, Xinlei Yi, Michael M. Zavlanos, Karl H. Johansson, Sandra Hirche

TL;DR

This work addresses risk-averse online optimization in non-stationary environments by optimizing the CVaR of the cost under time-varying distributions quantified with a Wasserstein-based variation budget $V_D$. It introduces a zeroth-order method that estimates CVaR gradients via multi-sample queries and uses a restarting scheme with batch size $\Delta_T$ to adapt to changes. Theoretical results show sub-linear dynamic regret for both convex and strongly convex costs, with tighter bounds in the strongly convex case and a trade-off controlled by the sampling parameter $a$. A parking-lot dynamic pricing example demonstrates that increased sampling improves performance, supporting the practical viability of the approach.

Abstract

Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost changes over time. We minimize risk-averse objective function using the Conditional Value at Risk (CVaR) as risk measure. Due to the difficulty in obtaining the exact CVaR gradient, we employ a zeroth-order optimization approach that queries the cost function values multiple times at each iteration and estimates the CVaR gradient using the sampled values. To facilitate the regret analysis, we use a variation metric based on Wasserstein distance to capture time-varying distributions. Given that the distribution variation is sub-linear in the total number of episodes, we show that our designed learning algorithm achieves sub-linear dynamic regret with high probability for both convex and strongly convex functions. Moreover, theoretical results suggest that increasing the number of samples leads to a reduction in the dynamic regret bounds until the sampling number reaches a specific limit. Finally, we provide numerical experiments of dynamic pricing in a parking lot to illustrate the efficacy of the designed algorithm.

Risk-averse Learning with Non-Stationary Distributions

TL;DR

This work addresses risk-averse online optimization in non-stationary environments by optimizing the CVaR of the cost under time-varying distributions quantified with a Wasserstein-based variation budget . It introduces a zeroth-order method that estimates CVaR gradients via multi-sample queries and uses a restarting scheme with batch size to adapt to changes. Theoretical results show sub-linear dynamic regret for both convex and strongly convex costs, with tighter bounds in the strongly convex case and a trade-off controlled by the sampling parameter . A parking-lot dynamic pricing example demonstrates that increased sampling improves performance, supporting the practical viability of the approach.

Abstract

Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost changes over time. We minimize risk-averse objective function using the Conditional Value at Risk (CVaR) as risk measure. Due to the difficulty in obtaining the exact CVaR gradient, we employ a zeroth-order optimization approach that queries the cost function values multiple times at each iteration and estimates the CVaR gradient using the sampled values. To facilitate the regret analysis, we use a variation metric based on Wasserstein distance to capture time-varying distributions. Given that the distribution variation is sub-linear in the total number of episodes, we show that our designed learning algorithm achieves sub-linear dynamic regret with high probability for both convex and strongly convex functions. Moreover, theoretical results suggest that increasing the number of samples leads to a reduction in the dynamic regret bounds until the sampling number reaches a specific limit. Finally, we provide numerical experiments of dynamic pricing in a parking lot to illustrate the efficacy of the designed algorithm.
Paper Structure (14 sections, 10 theorems, 56 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 10 theorems, 56 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

edwards2011kantorovich For any fixed $K >0$, where $\|\cdot\|_L$ is the Lipschitz norm, The right-hand side is the Kantorovich--Rubenstein dual form of the Wasserstein distance metric.

Figures (4)

  • Figure 1: Distribution range of the uniform random variable $\xi_t$.
  • Figure 2: From top to bottom: the parking price $x_t$ under Algorithm \ref{['alg:algorithm']} and the optimal parking price $x_t^{*}$; the resulted occupancy $r_t$ under Algorithm \ref{['alg:algorithm']}.
  • Figure 3: From top to bottom: dynamic regret; the CVaR values achieved by Algorithm \ref{['alg:algorithm']} and the minimum CVaR values.
  • Figure 4: Accumulated loss achieved by Algorithm \ref{['alg:algorithm']} under sampling strategies with constant number $n_t = 8,16,24$, respectively.

Theorems & Definitions (14)

  • Lemma 1
  • Definition 1
  • Lemma 2
  • Example 1
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Theorem 1
  • Remark 1
  • ...and 4 more