Learning the Maximum of a Hölder Function from Inexact Data
Simon Foucart
TL;DR
The paper addresses learning the maximum of a Hölder function from inexact point evaluations at prescribed datasites within the Optimal Recovery framework. It develops locally optimal (Chebyshev-center) and globally optimal procedures for general monotone quantities, and specializes to the maximum, deriving explicit formulas for both the local estimator $\Delta^{\rm loc}$ and the global estimator $\Delta^{\rm glo}$, including a simple correction term dependent on data locations. In the nonlinear maximum case, it shows that the globally optimal estimator is $\Delta^{\rm glo}(\mathbf{y}) = \max_m(y_m) + \tfrac{1}{2}\max[U]$ with $U(x)=\min_m \operatorname{dist}(x,x^{(m)})^\alpha$ (under equal observation error), and contrasts this with the locally optimal rule, also providing bounds and extensions to jittered data. Finally, it quantifies the minimal global worst-case error, proving bounds that scale with the observation error $\varepsilon$ and the grid resolution $M^{-{\alpha}/d}$, highlighting the curse of dimensionality and delivering exact results in the cube-grid setting.
Abstract
Within the theoretical framework of Optimal Recovery, one determines in this article the {\em best} procedures to learn a quantity of interest depending on a Hölder function acquired via inexact point evaluations at fixed datasites. {\em Best} here refers to procedures minimizing worst-case errors. The elementary arguments hint at the possibility of tackling nonlinear quantities of interest, with a particular focus on the function maximum. In a local setting, i.e., for a fixed data vector, the optimal procedure (outputting the so-called Chebyshev center) is precisely described relatively to a general model of inexact evaluations. Relatively to a slightly more restricted model and in a global setting, i.e., uniformly over all data vectors, another optimal procedure is put forward, showing how to correct the natural underestimate that simply returns the data vector maximum. Jitterred data are also briefly discussed as a side product of evaluating the minimal worst-case error optimized over the datasites.
