Scalable Acceleration for Classification-Based Derivative-Free Optimization

Tianyi Han; Jingya Li; Zhipeng Guo; Yuan Jin

Scalable Acceleration for Classification-Based Derivative-Free Optimization

Tianyi Han, Jingya Li, Zhipeng Guo, Yuan Jin

TL;DR

This paper tackles derivative-free optimization for black-box objectives by framing sequential classification-based methods as stochastic processes over a compact solution space $\Omega$ and sublevel sets $\Omega_\epsilon$. It identifies limitations of error-target dependence and introduces the hypothesis-target $\eta$-shattering rate to bound $(\epsilon,\delta)$-query complexity, then introduces the $\text{RACE-CARS}$ algorithm, which couples RACOS-style training with adaptive region shrinking governed by $\gamma$ and $\rho$. Empirical results on synthetic benchmarks and Language-Model-as-a-Service tuning show that $\text{RACE-CARS}$ accelerates convergence and yields competitive minima, with ablation studies clarifying hyperparameter roles. The work advances understanding of sample efficiency in high-dimensional, nonconvex DFO by emphasizing overlap between target regions and active regions of hypotheses and offering a scalable, practical acceleration strategy.

Abstract

Derivative-free optimization algorithms play an important role in scientific and engineering design optimization problems, especially when derivative information is not accessible. In this paper, we study the framework of sequential classification-based derivative-free optimization algorithms. By introducing learning theoretic concept hypothesis-target shattering rate, we revisit the computational complexity upper bound of SRACOS (Hu, Qian, and Yu 2017). Inspired by the revisited upper bound, we propose an algorithm named RACE-CARS, which adds a random region-shrinking step compared with SRACOS. We further establish theorems showing the acceleration by region shrinking. Experiments on the synthetic functions as well as black-box tuning for language-model-as-a-service demonstrate empirically the efficiency of RACE-CARS. An ablation experiment on the introduced hyperparameters is also conducted, revealing the mechanism of RACE-CARS and putting forward an empirical hyper-parameter tuning guidance.

Scalable Acceleration for Classification-Based Derivative-Free Optimization

TL;DR

This paper tackles derivative-free optimization for black-box objectives by framing sequential classification-based methods as stochastic processes over a compact solution space

and sublevel sets

. It identifies limitations of error-target dependence and introduces the hypothesis-target

-shattering rate to bound

-query complexity, then introduces the

algorithm, which couples RACOS-style training with adaptive region shrinking governed by

and

. Empirical results on synthetic benchmarks and Language-Model-as-a-Service tuning show that

accelerates convergence and yields competitive minima, with ablation studies clarifying hyperparameter roles. The work advances understanding of sample efficiency in high-dimensional, nonconvex DFO by emphasizing overlap between target regions and active regions of hypotheses and offering a scalable, practical acceleration strategy.

Abstract

Paper Structure (24 sections, 6 theorems, 37 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 24 sections, 6 theorems, 37 equations, 7 figures, 1 table, 4 algorithms.

Introduction
Outline and Contributions
Background
Theoretical Study
Issues Introduced by Error-Target Dependence
Revisit of Query Complexity Upper Bound
The Region-Shrinking Acceleration
Experiments
On Synthetic Functions
On Black-Box Tuning for LMaaS
Discussion
Beyond continuity
On the Concept Hypothesis-Target Shattering
Ablation Experiments
Conclusion
...and 9 more sections

Key Result

Theorem 1

hu2017sequential Given $0<\delta<1$ and $\epsilon>0,$ if a sequential classification-based optimization algorithm has error-target $\theta$-dependence and $\gamma$-shrinking rate, then its $(\epsilon,\delta)$-query complexity is upper bounded by where $\Phi_t=\biggl(1-\theta-{\mathbb{P}}(\mathcal{R}_{D_t})-m(\Omega)\sqrt{\frac{1}{2}D_{\mathrm{KL}}(D_t\|{\mathcal{U}}_\Omega)}\biggr)\cdot|\Omega_{\

Figures (7)

Figure 1: Comparison of synthetic functions with $n=50$.
Figure 2: Comparison of synthetic functions with $n=500$.
Figure 3: Comparisons on SST-2
Figure 4: Synthetic functions with $n=2$.
Figure 5: Comparison on discontinuous objectives
...and 2 more figures

Theorems & Definitions (13)

Definition 1: $(\epsilon,\delta)$-Query Complexity
Definition 2: Error-Target $\theta$-Dependence
Definition 3: $\gamma$-Shrinking Rate
Theorem 1
Definition 4: Hypothesis-Target $\eta$-Shattering Rate
Theorem 2
Theorem 3
Definition 5: Dimensionally local Holder continuity
Theorem 2
proof
...and 3 more

Scalable Acceleration for Classification-Based Derivative-Free Optimization

TL;DR

Abstract

Scalable Acceleration for Classification-Based Derivative-Free Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (13)