Table of Contents
Fetching ...

The Multi-Query Paradox in Zeroth-Order Optimization

Wei Lin, Qingyu Song, Hong Xu

TL;DR

The paper tackles zeroth-order optimization under a fixed query budget, revealing a fundamental paradox between multi-query aggregation and budget allocation. It introduces two estimators, ZO-Avg and ZO-Align, showing that ZO-Avg gains are best exploited by single-query steps, while ZO-Align benefits from larger query blocks and full-subspace updates. Across strongly convex, convex, non-convex, and stochastic settings, the authors derive convergence rates that make the dependence on the per-iteration query count explicit, resolving the allocation problem for these estimators. High-dimensional analysis shows the advantage of ZO-Align diminishes when dimensions far exceed query batch size, yet experiments on classical problems and large LLM fine-tuning validate the theoretical dichotomy. The work provides practical guidance: choose the aggregation rule first, as it dictates whether to pursue many cheap steps or fewer high-quality subspace updates.

Abstract

Zeroth-order (ZO) optimization provides a powerful framework for problems where explicit gradients are unavailable and have to be approximated using only queries to function value. The prevalent single-query approach is simple, but suffers from high estimation variance, motivating a multi-query paradigm to improves estimation accuracy. This, however, creates a critical trade-off: under a fixed budget of queries (i.e. cost), queries per iteration and the total number of optimization iterations are inversely proportional to one another. How to best allocate this budget is a fundamental, under-explored question. This work systematically resolves this query allocation problem. We analyze two aggregation methods: the de facto simple averaging (ZO-Avg), and a new Projection Alignment method (ZO-Align) we derive from local surrogate minimization. By deriving convergence rates for both methods that make the dependence on the number of queries explicit across strongly convex, convex, non-convex, and stochastic settings, we uncover a stark dichotomy: For ZO-Avg, we prove that using more than one query per iteration is always query-inefficient, rendering the single-query approach optimal. On the contrary, ZO-Align generally performs better with more queries per iteration, resulting in a full-subspace estimation as the optimal approach. Thus, our work clarifies that the multi-query problem boils down to a choice not about an intermediate query size, but between two classic algorithms, a choice dictated entirely by the aggregation method used. These theoretical findings are also consistently validated by extensive experiments.

The Multi-Query Paradox in Zeroth-Order Optimization

TL;DR

The paper tackles zeroth-order optimization under a fixed query budget, revealing a fundamental paradox between multi-query aggregation and budget allocation. It introduces two estimators, ZO-Avg and ZO-Align, showing that ZO-Avg gains are best exploited by single-query steps, while ZO-Align benefits from larger query blocks and full-subspace updates. Across strongly convex, convex, non-convex, and stochastic settings, the authors derive convergence rates that make the dependence on the per-iteration query count explicit, resolving the allocation problem for these estimators. High-dimensional analysis shows the advantage of ZO-Align diminishes when dimensions far exceed query batch size, yet experiments on classical problems and large LLM fine-tuning validate the theoretical dichotomy. The work provides practical guidance: choose the aggregation rule first, as it dictates whether to pursue many cheap steps or fewer high-quality subspace updates.

Abstract

Zeroth-order (ZO) optimization provides a powerful framework for problems where explicit gradients are unavailable and have to be approximated using only queries to function value. The prevalent single-query approach is simple, but suffers from high estimation variance, motivating a multi-query paradigm to improves estimation accuracy. This, however, creates a critical trade-off: under a fixed budget of queries (i.e. cost), queries per iteration and the total number of optimization iterations are inversely proportional to one another. How to best allocate this budget is a fundamental, under-explored question. This work systematically resolves this query allocation problem. We analyze two aggregation methods: the de facto simple averaging (ZO-Avg), and a new Projection Alignment method (ZO-Align) we derive from local surrogate minimization. By deriving convergence rates for both methods that make the dependence on the number of queries explicit across strongly convex, convex, non-convex, and stochastic settings, we uncover a stark dichotomy: For ZO-Avg, we prove that using more than one query per iteration is always query-inefficient, rendering the single-query approach optimal. On the contrary, ZO-Align generally performs better with more queries per iteration, resulting in a full-subspace estimation as the optimal approach. Thus, our work clarifies that the multi-query problem boils down to a choice not about an intermediate query size, but between two classic algorithms, a choice dictated entirely by the aggregation method used. These theoretical findings are also consistently validated by extensive experiments.

Paper Structure

This paper contains 46 sections, 14 theorems, 148 equations, 6 figures.

Key Result

Proposition 3.1

Let $g = \nabla f(x)$ and assume the query directions are i.i.d. samples from $\mathcal{N}(0, I)$. The Mean Squared Error of the idealized estimators are:

Figures (6)

  • Figure 1: Objective with query used: Strongly convex case.
  • Figure 2: Objective with query used: Convex case.
  • Figure 3: Objective with query used: Nonconvex case.
  • Figure 4: Objective with query used: Stochastic case.
  • Figure 5: Finetuning Qwen3-0.6b on SST2 and CB.
  • ...and 1 more figures

Theorems & Definitions (42)

  • Proposition 3.1
  • Proposition 4.1
  • proof
  • Theorem 4.2
  • proof
  • Theorem 4.3
  • proof
  • Theorem 4.4
  • proof
  • Theorem 4.5
  • ...and 32 more