Table of Contents
Fetching ...

A Structured Tour of Optimization with Finite Differences

Marco Rando, Cesare Molinari, Lorenzo Rosasco, Silvia Villa

TL;DR

This paper analyzes structured versus unstructured direction generation for finite-difference zeroth-order optimization under fixed evaluation budgets. It reviews and extends several structured constructions (e.g., QR-based orthogonalization, random Householder, Butterfly, and permuted variants) and benchmarks them against unstructured methods on synthetic, CUTEst, and high-dimensional adversarial MNIST tasks. The findings show that structured directions can achieve gradient-approximation quality and convergence comparable to or better than unstructured approaches at similar cost, particularly when the number of directions is a substantial fraction of the dimension (e.g., ell ≥ d/3 or d/2). The results advocate for incorporating structure in direction design for high-dimensional zeroth-order problems and motivate further theory and scalable implementations for large-scale applications such as large language model fine-tuning.

Abstract

Finite-difference methods are widely used for zeroth-order optimization in settings where gradient information is unavailable or expensive to compute. These procedures mimic first-order strategies by approximating gradients through function evaluations along a set of random directions. From a theoretical perspective, recent studies indicate that imposing structure (such as orthogonality) on the chosen directions allows for the derivation of convergence rates comparable to those achieved with unstructured random directions (i.e., directions sampled independently from a distribution). Empirically, although structured directions are expected to enhance performance, they often introduce additional computational costs, which can limit their applicability in high-dimensional settings. In this work, we examine the impact of structured direction selection in finite-difference methods. We review and extend several strategies for constructing structured direction matrices and compare them with unstructured approaches in terms of computational cost, gradient approximation quality, and convergence behavior. Our evaluation spans both synthetic tasks and real-world applications such as adversarial perturbation. The results demonstrate that structured directions can be generated with computational costs comparable to unstructured ones while significantly improving gradient estimation accuracy and optimization performance.

A Structured Tour of Optimization with Finite Differences

TL;DR

This paper analyzes structured versus unstructured direction generation for finite-difference zeroth-order optimization under fixed evaluation budgets. It reviews and extends several structured constructions (e.g., QR-based orthogonalization, random Householder, Butterfly, and permuted variants) and benchmarks them against unstructured methods on synthetic, CUTEst, and high-dimensional adversarial MNIST tasks. The findings show that structured directions can achieve gradient-approximation quality and convergence comparable to or better than unstructured approaches at similar cost, particularly when the number of directions is a substantial fraction of the dimension (e.g., ell ≥ d/3 or d/2). The results advocate for incorporating structure in direction design for high-dimensional zeroth-order problems and motivate further theory and scalable implementations for large-scale applications such as large language model fine-tuning.

Abstract

Finite-difference methods are widely used for zeroth-order optimization in settings where gradient information is unavailable or expensive to compute. These procedures mimic first-order strategies by approximating gradients through function evaluations along a set of random directions. From a theoretical perspective, recent studies indicate that imposing structure (such as orthogonality) on the chosen directions allows for the derivation of convergence rates comparable to those achieved with unstructured random directions (i.e., directions sampled independently from a distribution). Empirically, although structured directions are expected to enhance performance, they often introduce additional computational costs, which can limit their applicability in high-dimensional settings. In this work, we examine the impact of structured direction selection in finite-difference methods. We review and extend several strategies for constructing structured direction matrices and compare them with unstructured approaches in terms of computational cost, gradient approximation quality, and convergence behavior. Our evaluation spans both synthetic tasks and real-world applications such as adversarial perturbation. The results demonstrate that structured directions can be generated with computational costs comparable to unstructured ones while significantly improving gradient estimation accuracy and optimization performance.

Paper Structure

This paper contains 16 sections, 1 theorem, 34 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

Let $P_1 = GI_{d,\ell}$ where $G$ is sampled uniformly from $O(d)$ with respect to the Haar measure and $P_2$ be a random matrix where every column is sampled i.i.d. from the unit sphere. Then, for every $x \in \mathbb{R}^d$ and $h > 0$, the following inequality holds where $g$ is the finite difference approximation defined in eq. eqn:forward_fd.

Figures (8)

  • Figure 1: Time cost for constructing direction matrices.
  • Figure 2: Relative gradient approximation error for the Least Squares, Qing, and Rosenbrock functions using the surrogate in Eq. \ref{['eqn:forward_fd']}, with direction matrices generated by: S (Spherical), G (Gaussian), R (Rademacher), C (Coordinate), H (Householder), B (Butterfly), P (Permuted Householder), Q (QR).
  • Figure 3: Fraction of solved problems for gradient approximation error on subset of CUTEst benchmark.
  • Figure 4: Function value progress in optimizing Least-square, Qing and Rosenbrock functions.
  • Figure 5: Top row: Fraction of problems solved as a function of accuracy threshold $\tau$. Bottom row: Fraction of problems solved at fixed accuracy $\tau = 10^{-2}$ versus the function evaluation budget.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Lemma 1: Approximation Error
  • proof