Table of Contents
Fetching ...

A Lipschitz Bandits Approach for Continuous Hyperparameter Optimization

Yasong Feng, Weijian Luo, Yimin Huang, Tianyu Wang

TL;DR

This work tackles continuous hyperparameter optimization by formulating it as a pure-exploration Lipschitz bandit problem with batched feedback. It introduces BLiE, a model-free algorithm that exploits Lipschitz continuity to adaptively search and allocate budget, achieving provable simple-regret guarantees and favorable communication efficiency via ACE sequences. Theoretical results show BLiE attains Δ ≤ c T^{-1/(d_z+β)} with relatively few batches, and it outperforms baselines like Hyperband in hard regimes, with lower bounds established for competing strategies. Empirically, BLiE demonstrates superior performance on neural-network tuning tasks and diffusion-model noise scheduling, yielding faster sampling and better final accuracies.

Abstract

One of the most critical problems in machine learning is HyperParameter Optimization (HPO), since choice of hyperparameters has a significant impact on final model performance. Although there are many HPO algorithms, they either have no theoretical guarantees or require strong assumptions. To this end, we introduce BLiE -- a Lipschitz-bandit-based algorithm for HPO that only assumes Lipschitz continuity of the objective function. BLiE exploits the landscape of the objective function to adaptively search over the hyperparameter space. Theoretically, we show that $(i)$ BLiE finds an $ε$-optimal hyperparameter with $\mathcal{O} \left( ε^{-(d_z + β)}\right)$ total budgets, where $d_z$ and $β$ are problem intrinsic; $(ii)$ BLiE is highly parallelizable. Empirically, we demonstrate that BLiE outperforms the state-of-the-art HPO algorithms on benchmark tasks. We also apply BLiE to search for noise schedule of diffusion models. Comparison with the default schedule shows that BLiE schedule greatly improves the sampling speed.

A Lipschitz Bandits Approach for Continuous Hyperparameter Optimization

TL;DR

This work tackles continuous hyperparameter optimization by formulating it as a pure-exploration Lipschitz bandit problem with batched feedback. It introduces BLiE, a model-free algorithm that exploits Lipschitz continuity to adaptively search and allocate budget, achieving provable simple-regret guarantees and favorable communication efficiency via ACE sequences. Theoretical results show BLiE attains Δ ≤ c T^{-1/(d_z+β)} with relatively few batches, and it outperforms baselines like Hyperband in hard regimes, with lower bounds established for competing strategies. Empirically, BLiE demonstrates superior performance on neural-network tuning tasks and diffusion-model noise scheduling, yielding faster sampling and better final accuracies.

Abstract

One of the most critical problems in machine learning is HyperParameter Optimization (HPO), since choice of hyperparameters has a significant impact on final model performance. Although there are many HPO algorithms, they either have no theoretical guarantees or require strong assumptions. To this end, we introduce BLiE -- a Lipschitz-bandit-based algorithm for HPO that only assumes Lipschitz continuity of the objective function. BLiE exploits the landscape of the objective function to adaptively search over the hyperparameter space. Theoretically, we show that BLiE finds an -optimal hyperparameter with total budgets, where and are problem intrinsic; BLiE is highly parallelizable. Empirically, we demonstrate that BLiE outperforms the state-of-the-art HPO algorithms on benchmark tasks. We also apply BLiE to search for noise schedule of diffusion models. Comparison with the default schedule shows that BLiE schedule greatly improves the sampling speed.
Paper Structure (25 sections, 7 theorems, 33 equations, 6 figures, 1 table, 4 algorithms)

This paper contains 25 sections, 7 theorems, 33 equations, 6 figures, 1 table, 4 algorithms.

Key Result

Theorem 1

If Assumption ass:hoeffding and ass:lip are satisfied, then output arm $\widetilde{x}^*$ of BLiE algorithm with total budget $T$, edge-length sequence $r_m=2^{-m}$, $\alpha=2L+2$ and $\beta$ satisfies where $d_z$ is the zooming dimension and $c$ is a constant. In addition, BLiE needs no more than $\frac{1}{d_z+\beta}\log T$ batches to achieve this simple regret.

Figures (6)

  • Figure 1: Test error of a CNN-classifier as a function of learning rate.
  • Figure 2: Partition and elimination process of a BLiE run. The $i$-th subfigure shows the pattern before the $i$-th batch. Dark gray cubes are those eliminated in the most recent batch, while the light gray ones are those eliminated in earlier batches.
  • Figure 3: HPO processes for different tasks. Figure \ref{['fig:toy']} shows results of toy example with different limit losses. Figure \ref{['fig:mnist']} and \ref{['fig:cifar']} show results of tuning optimizer for neural-network classifiers on MNIST and CIFAR-10.
  • Figure 4: FWD score of DDPM with different diffusion steps $T$.
  • Figure 5: MNIST samples generated using different noise schedules and diffusion steps.
  • ...and 1 more figures

Theorems & Definitions (17)

  • Remark 1
  • Theorem 1
  • Lemma 1
  • Definition 1
  • Theorem 2
  • proof
  • Theorem 3
  • Theorem 4
  • proof
  • Lemma 2
  • ...and 7 more