From Random Search to Bandit Learning in Metric Measure Spaces
Chuying Han, Yasong Feng, Tianyu Wang
TL;DR
This paper introduces the scattering dimension $d_s$ to provide a non-heuristic theory for Random Search in hyperparameter optimization, showing that the optimality gap decays as $\widetilde{O}((1/T)^{1/d_s})$ in noise-free settings and as $\widetilde{O}((1/T)^{1/(d_s+1)})$ under bounded iid noise. It connects $d_s$ to the zooming dimension $d_z$ and demonstrates that, in metric-measure spaces endowed with a probability measure, the BLiN-MOS algorithm achieves regret $\widetilde{O}(T^{d_z/(d_z+1)})$ with only $O(\log\log T)$ communication rounds. The work also clarifies the relationship between scattering and zooming dimensions, showing how landscape geometry governs both sampling efficiency and near-optimal-region structure, and emphasizes the necessity of a well-defined probability measure for scattering-dimension analysis. Overall, the results furnish the first non-heuristic justification for Random Search performance, propose a Lipschitz-bandit algorithm tailored to metric spaces, and quantify fundamental trade-offs between discrimination and exploration in high-dimensional landscapes.
Abstract
Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also endowed with a probability measure, and show that under mild conditions, BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance.
