Optimizing Input Data Collection for Ranking and Selection
Eunhye Song, Taeho Kim
TL;DR
The paper tackles ranking-and-selection under input uncertainty by formulating a Bayesian framework across multiple data sources and introducing the Most Probable Best (MPB) as the estimator of the optimum. It develops OSAR, a sequential budget-allocation algorithm that optimizes input-data and simulation sampling via large-deviation-based rates, and extends it with Kernel Ridge Regression (OSAR$^+$) to improve finite-sample performance and handle continuous input spaces with strong consistency guarantees. The authors prove exponential convergence of the MPB’s posterior probability of optimality, characterize ε-optimal static sampling ratios, and demonstrate OSAR’s superior performance over a Bayesian optimization baseline across discrete and continuous input spaces and in a food-contamination application. They also propose continuous-parameter extensions (OSAR$^{++}$, OSAR$^{+FD}$, OSAR$^{+PS}$) with Nyström-KRR to manage computational costs while preserving asymptotic guarantees, offering a practical, theoretically-grounded approach to data-budgeted R&S under input uncertainty.
Abstract
We study a ranking and selection (R&S) problem when all solutions share common parametric Bayesian input models updated with the data collected from multiple independent data-generating sources. Our objective is to identify the best system by designing a sequential sampling algorithm that collects input and simulation data given a budget. We adopt the most probable best (MPB) as the estimator of the optimum and show that its posterior probability of optimality converges to one at an exponential rate as the sampling budget increases. Assuming that the input parameters belong to a finite set, we characterize the $ε$-optimal static sampling ratios for input and simulation data that maximize the convergence rate. Using these ratios as guidance, we propose the optimal sampling algorithm for R&S (OSAR) that achieves the $ε$-optimal ratios almost surely in the limit. We further extend OSAR by adopting the kernel ridge regression to improve the simulation output mean prediction. This not only improves OSAR's finite-sample performance, but also lets us tackle the case where the input parameters lie in a continuous space with a strong consistency guarantee for finding the optimum. We numerically demonstrate that OSAR outperforms a state-of-the-art competitor.
