Table of Contents
Fetching ...

An Enhanced Zeroth-Order Stochastic Frank-Wolfe Framework for Constrained Finite-Sum Optimization

Haishan Ye, Yinghui Huang, Hao Di, Xiangyu Chang

TL;DR

This work tackles constrained finite-sum optimization in a black-box setting by introducing a zeroth-order stochastic Frank-Wolfe algorithm with double variance reduction (ZSFW-DVR). The method couples a refined gradient estimator that reduces zeroth-order variance with a batch PAGE-style variance reduction for stochastic sampling, enabling efficient progress without full gradient computations. The authors prove non-asymptotic convergence for both convex and non-convex objectives, achieving a convex-query complexity of $O(d\sqrt{n}/\varepsilon)$ and a non-convex complexity of $O(d^{3/2}\sqrt{n}/\varepsilon^2)$ in zeroth-order queries, alongside corresponding LMO-call bounds. Empirical results on black-box sparse logistic regression, robust classification, and adversarial attacks demonstrate improved convergence rates and query efficiency compared to existing zeroth-order FW methods, highlighting practical scalability to high-dimensional, large-sample problems.

Abstract

We propose an enhanced zeroth-order stochastic Frank-Wolfe framework to address constrained finite-sum optimization problems, a structure prevalent in large-scale machine-learning applications. Our method introduces a novel double variance reduction framework that effectively reduces the gradient approximation variance induced by zeroth-order oracles and the stochastic sampling variance from finite-sum objectives. By leveraging this framework, our algorithm achieves significant improvements in query efficiency, making it particularly well-suited for high-dimensional optimization tasks. Specifically, for convex objectives, the algorithm achieves a query complexity of O(d \sqrt{n}/ε) to find an epsilon-suboptimal solution, where d is the dimensionality and n is the number of functions in the finite-sum objective. For non-convex objectives, it achieves a query complexity of O(d^{3/2}\sqrt{n}/ε^2 ) without requiring the computation ofd partial derivatives at each iteration. These complexities are the best known among zeroth-order stochastic Frank-Wolfe algorithms that avoid explicit gradient calculations. Empirical experiments on convex and non-convex machine learning tasks, including sparse logistic regression, robust classification, and adversarial attacks on deep networks, validate the computational efficiency and scalability of our approach. Our algorithm demonstrates superior performance in both convergence rate and query complexity compared to existing methods.

An Enhanced Zeroth-Order Stochastic Frank-Wolfe Framework for Constrained Finite-Sum Optimization

TL;DR

This work tackles constrained finite-sum optimization in a black-box setting by introducing a zeroth-order stochastic Frank-Wolfe algorithm with double variance reduction (ZSFW-DVR). The method couples a refined gradient estimator that reduces zeroth-order variance with a batch PAGE-style variance reduction for stochastic sampling, enabling efficient progress without full gradient computations. The authors prove non-asymptotic convergence for both convex and non-convex objectives, achieving a convex-query complexity of and a non-convex complexity of in zeroth-order queries, alongside corresponding LMO-call bounds. Empirical results on black-box sparse logistic regression, robust classification, and adversarial attacks demonstrate improved convergence rates and query efficiency compared to existing zeroth-order FW methods, highlighting practical scalability to high-dimensional, large-sample problems.

Abstract

We propose an enhanced zeroth-order stochastic Frank-Wolfe framework to address constrained finite-sum optimization problems, a structure prevalent in large-scale machine-learning applications. Our method introduces a novel double variance reduction framework that effectively reduces the gradient approximation variance induced by zeroth-order oracles and the stochastic sampling variance from finite-sum objectives. By leveraging this framework, our algorithm achieves significant improvements in query efficiency, making it particularly well-suited for high-dimensional optimization tasks. Specifically, for convex objectives, the algorithm achieves a query complexity of O(d \sqrt{n}/ε) to find an epsilon-suboptimal solution, where d is the dimensionality and n is the number of functions in the finite-sum objective. For non-convex objectives, it achieves a query complexity of O(d^{3/2}\sqrt{n}/ε^2 ) without requiring the computation ofd partial derivatives at each iteration. These complexities are the best known among zeroth-order stochastic Frank-Wolfe algorithms that avoid explicit gradient calculations. Empirical experiments on convex and non-convex machine learning tasks, including sparse logistic regression, robust classification, and adversarial attacks on deep networks, validate the computational efficiency and scalability of our approach. Our algorithm demonstrates superior performance in both convergence rate and query complexity compared to existing methods.
Paper Structure (30 sections, 20 theorems, 83 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 20 theorems, 83 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Given a function $h(x)$ satisfying Assumption assump:smoothness, that is, $h(x)$ is $L$-smooth, then the approximate gradient $\hat{\nabla} h(x, U, \mu)$ defined in Eq. eq:nab_h has the following property with $\tau_h(x, U_{:, j}, \mu) = \frac{h(x + \mu U_{:, j}) - h(x-\mu U_{:, j}) -2\mu\left\langle \nabla h(x), U_{:, j} \right\rangle}{2\mu}$ and $|\tau_h(x, U_{:, j}, \mu)| \leq \frac{L\mu\left\

Figures (4)

  • Figure 1: Objective gap comparison with different algorithms for black-box sparse logistic regression. The $y$ axis represents the logarithm (base 10) of the objective gap, and the $x$ is the number of queries during the optimization process.
  • Figure 2: Objective gap comparison with different algorithms for black-box robust regression. The setting of the $x$ and $y$ axis are the same with Figure \ref{['fig:convex_result']}.
  • Figure 3: Attack success rates against the number of queries on MNIST and CIFAR-10 datasets.
  • Figure 4: Visual comparisons of original images and their adversarial examples generated by each algorithm.

Theorems & Definitions (21)

  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Theorem 2
  • Corollary 3
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • ...and 11 more