Table of Contents
Fetching ...

Scalable Acceleration for Classification-Based Derivative-Free Optimization

Tianyi Han, Jingya Li, Zhipeng Guo, Yuan Jin

TL;DR

This paper tackles derivative-free optimization for black-box objectives by framing sequential classification-based methods as stochastic processes over a compact solution space $\Omega$ and sublevel sets $\Omega_\epsilon$. It identifies limitations of error-target dependence and introduces the hypothesis-target $\eta$-shattering rate to bound $(\epsilon,\delta)$-query complexity, then introduces the $\text{RACE-CARS}$ algorithm, which couples RACOS-style training with adaptive region shrinking governed by $\gamma$ and $\rho$. Empirical results on synthetic benchmarks and Language-Model-as-a-Service tuning show that $\text{RACE-CARS}$ accelerates convergence and yields competitive minima, with ablation studies clarifying hyperparameter roles. The work advances understanding of sample efficiency in high-dimensional, nonconvex DFO by emphasizing overlap between target regions and active regions of hypotheses and offering a scalable, practical acceleration strategy.

Abstract

Derivative-free optimization algorithms play an important role in scientific and engineering design optimization problems, especially when derivative information is not accessible. In this paper, we study the framework of sequential classification-based derivative-free optimization algorithms. By introducing learning theoretic concept hypothesis-target shattering rate, we revisit the computational complexity upper bound of SRACOS (Hu, Qian, and Yu 2017). Inspired by the revisited upper bound, we propose an algorithm named RACE-CARS, which adds a random region-shrinking step compared with SRACOS. We further establish theorems showing the acceleration by region shrinking. Experiments on the synthetic functions as well as black-box tuning for language-model-as-a-service demonstrate empirically the efficiency of RACE-CARS. An ablation experiment on the introduced hyperparameters is also conducted, revealing the mechanism of RACE-CARS and putting forward an empirical hyper-parameter tuning guidance.

Scalable Acceleration for Classification-Based Derivative-Free Optimization

TL;DR

This paper tackles derivative-free optimization for black-box objectives by framing sequential classification-based methods as stochastic processes over a compact solution space and sublevel sets . It identifies limitations of error-target dependence and introduces the hypothesis-target -shattering rate to bound -query complexity, then introduces the algorithm, which couples RACOS-style training with adaptive region shrinking governed by and . Empirical results on synthetic benchmarks and Language-Model-as-a-Service tuning show that accelerates convergence and yields competitive minima, with ablation studies clarifying hyperparameter roles. The work advances understanding of sample efficiency in high-dimensional, nonconvex DFO by emphasizing overlap between target regions and active regions of hypotheses and offering a scalable, practical acceleration strategy.

Abstract

Derivative-free optimization algorithms play an important role in scientific and engineering design optimization problems, especially when derivative information is not accessible. In this paper, we study the framework of sequential classification-based derivative-free optimization algorithms. By introducing learning theoretic concept hypothesis-target shattering rate, we revisit the computational complexity upper bound of SRACOS (Hu, Qian, and Yu 2017). Inspired by the revisited upper bound, we propose an algorithm named RACE-CARS, which adds a random region-shrinking step compared with SRACOS. We further establish theorems showing the acceleration by region shrinking. Experiments on the synthetic functions as well as black-box tuning for language-model-as-a-service demonstrate empirically the efficiency of RACE-CARS. An ablation experiment on the introduced hyperparameters is also conducted, revealing the mechanism of RACE-CARS and putting forward an empirical hyper-parameter tuning guidance.
Paper Structure (24 sections, 6 theorems, 37 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 24 sections, 6 theorems, 37 equations, 7 figures, 1 table, 4 algorithms.

Key Result

Theorem 1

hu2017sequential Given $0<\delta<1$ and $\epsilon>0,$ if a sequential classification-based optimization algorithm has error-target $\theta$-dependence and $\gamma$-shrinking rate, then its $(\epsilon,\delta)$-query complexity is upper bounded by where $\Phi_t=\biggl(1-\theta-{\mathbb{P}}(\mathcal{R}_{D_t})-m(\Omega)\sqrt{\frac{1}{2}D_{\mathrm{KL}}(D_t\|{\mathcal{U}}_\Omega)}\biggr)\cdot|\Omega_{\

Figures (7)

  • Figure 1: Comparison of synthetic functions with $n=50$.
  • Figure 2: Comparison of synthetic functions with $n=500$.
  • Figure 3: Comparisons on SST-2
  • Figure 4: Synthetic functions with $n=2$.
  • Figure 5: Comparison on discontinuous objectives
  • ...and 2 more figures

Theorems & Definitions (13)

  • Definition 1: $(\epsilon,\delta)$-Query Complexity
  • Definition 2: Error-Target $\theta$-Dependence
  • Definition 3: $\gamma$-Shrinking Rate
  • Theorem 1
  • Definition 4: Hypothesis-Target $\eta$-Shattering Rate
  • Theorem 2
  • Theorem 3
  • Definition 5: Dimensionally local Holder continuity
  • Theorem 2
  • proof
  • ...and 3 more