Table of Contents
Fetching ...

Sharpness-Aware Black-Box Optimization

Feiyang Ye, Yueming Lyu, Xuehao Wang, Masashi Sugiyama, Yu Zhang, Ivor Tsang

TL;DR

A novel Sharpness-Aware Black-box Optimization (SABO) algorithm, which applies a sharpness-aware minimization strategy to improve the model generalization in black-box optimization.

Abstract

Black-box optimization algorithms have been widely used in various machine learning problems, including reinforcement learning and prompt fine-tuning. However, directly optimizing the training loss value, as commonly done in existing black-box optimization methods, could lead to suboptimal model quality and generalization performance. To address those problems in black-box optimization, we propose a novel Sharpness-Aware Black-box Optimization (SABO) algorithm, which applies a sharpness-aware minimization strategy to improve the model generalization. Specifically, the proposed SABO method first reparameterizes the objective function by its expectation over a Gaussian distribution. Then it iteratively updates the parameterized distribution by approximated stochastic gradients of the maximum objective value within a small neighborhood around the current solution in the Gaussian distribution space. Theoretically, we prove the convergence rate and generalization bound of the proposed SABO algorithm. Empirically, extensive experiments on the black-box prompt fine-tuning tasks demonstrate the effectiveness of the proposed SABO method in improving model generalization performance.

Sharpness-Aware Black-Box Optimization

TL;DR

A novel Sharpness-Aware Black-box Optimization (SABO) algorithm, which applies a sharpness-aware minimization strategy to improve the model generalization in black-box optimization.

Abstract

Black-box optimization algorithms have been widely used in various machine learning problems, including reinforcement learning and prompt fine-tuning. However, directly optimizing the training loss value, as commonly done in existing black-box optimization methods, could lead to suboptimal model quality and generalization performance. To address those problems in black-box optimization, we propose a novel Sharpness-Aware Black-box Optimization (SABO) algorithm, which applies a sharpness-aware minimization strategy to improve the model generalization. Specifically, the proposed SABO method first reparameterizes the objective function by its expectation over a Gaussian distribution. Then it iteratively updates the parameterized distribution by approximated stochastic gradients of the maximum objective value within a small neighborhood around the current solution in the Gaussian distribution space. Theoretically, we prove the convergence rate and generalization bound of the proposed SABO algorithm. Empirically, extensive experiments on the black-box prompt fine-tuning tasks demonstrate the effectiveness of the proposed SABO method in improving model generalization performance.

Paper Structure

This paper contains 51 sections, 14 theorems, 102 equations, 5 figures, 3 tables, 2 algorithms.

Key Result

Proposition 4.2

lyu2021black Suppose $p_{\boldsymbol{\theta}}(\boldsymbol{x})$ is a Gaussian distribution with $\boldsymbol{\theta}=\{\boldsymbol{\mu},{\boldsymbol{\Sigma}}\}$ and $F(\boldsymbol{x})$ is a convex function. Let $J(\boldsymbol{\theta}) = \mathbb{E}_{p_{\boldsymbol{\theta}}}[F(\boldsymbol{x})]$, and $J where $\boldsymbol{0}$ denotes a zero matrix with appropriate size.

Figures (5)

  • Figure 1: Results on the four test functions with problem dimension $d=500$ and $N=50$.
  • Figure 2: Results on the four test functions with problem dimension $d=200$ and $N=50$.
  • Figure 3: Results on the four test functions with problem dimension $d=1000$ and $N=50$.
  • Figure 4: Results on the four test functions with problem dimension $d=200$ and $N=10$.
  • Figure 5: Results on the four test functions with problem dimension $d=200$ and $N=100$.

Theorems & Definitions (16)

  • Proposition 4.2
  • Theorem 4.3
  • Proposition 4.5
  • Theorem 4.6
  • Remark 4.7
  • Theorem 4.8
  • Theorem A.1
  • Lemma C.1
  • Lemma C.2
  • Lemma C.3
  • ...and 6 more