Regret Minimization in Scalar, Static, Non-linear Optimization Problems
Ying Wang, Mirko Pasquini, Kévin Colin, Håkan Hjalmarsson
TL;DR
The paper addresses regret minimization for static, nonlinear, scalar optimization with an unknown parameter $\theta_0$ by modeling the system as $y_t = h(u_t, \theta_0) + e_t$ and using Certainty Equivalence to choose exploitation inputs $u_t^* = U(\hat{\theta}_t)$ while injecting exploration $\alpha_t$ with variance $x_t$. A Taylor-based regret approximation links estimation error to Fisher information accumulation, leading to a tractable upper bound $R_{ub}$ that depends on the sequence $\{x_t\}$ through an incremental information function $i(\cdot)$. Theoretical results show the optimal exploration pattern is either lazy or immediate, with a sufficient condition ensuring immediacy; this greatly simplifies design in nonlinear settings. Numerical experiments on a quadratic toy example and 10 parameterized systems demonstrate that immediate exploration, especially a zero-mean binary excitation, often minimizes regret, while lazy exploration is never optimal, and decaying Gaussian exploration can mimic immediacy under fast variance decay. The work provides a principled, analyzable guideline for exploration in real-time nonlinear optimization and highlights practical trade-offs when selecting the exploration distribution.
Abstract
We study the problem of determining an effective exploration strategy in static and non-linear optimization problems, which depend on an unknown scalar parameter to be learned from online collected noisy data. An optimal trade-off between exploration and exploitation is crucial for effective optimization under uncertainties, and to achieve this we consider a cumulative regret minimization approach over a finite horizon, with each time instant in the horizon characterized by a stochastic exploration signal, whose variance is to be designed. We aim to extend the well-established concepts of regret minimization from linear to non-linear systems, with a focus on the subsequent conceptual differences and challenges. Thus, under an idealized assumption on an appropriately defined information function associated with the excitation, we are able to show that an optimal exploration strategy is either to use no exploration at all (called lazy exploration) or adding an exploration excitation only at the first time instant of the horizon (called immediate exploration). A quadratic numerical example is presented to demonstrate the effectiveness of the proposed strategy.
