Table of Contents
Fetching ...

Coherence-based Approximate Derivatives via Web of Affine Spaces Optimization

Daniel Rakita, Chen Liang, Qian Wang

TL;DR

This work addresses the cost of computing sequences of derivatives for differentiable functions by exploiting coherence across consecutive inputs. It introduces Web of Affine Spaces (WASP) Optimization, reframing derivative computation as a constrained least-squares problem that uses a web of affine spaces anchored by past JVP observations to locate an approximate derivative $\mathbf{D}^{*}$ efficiently; the solution is obtained in closed form via a KKT system and can be updated with minimal forward passes. An error-detection/correction mechanism maintains accuracy over long sequences by substituting ground-truth JVPs when needed and updating the web of affine spaces to re-intersect at the new estimate. Empirical results show WASP achieves faster derivative computation than finite differencing and certain auto-diff baselines on small-to-medium problems and demonstrates practical benefits in robot optimization, while acknowledging limitations in precision, step-size sensitivity, and scalability to large networks. The approach offers a pragmatic pathway to real-time derivative estimation in robotics and related domains, enabling more responsive optimization with modest computational overhead and code changes.

Abstract

Computing derivatives is a crucial subroutine in computer science and related fields as it provides a local characterization of a function's steepest directions of ascent or descent. In this work, we recognize that derivatives are often not computed in isolation; conversely, it is quite common to compute a \textit{sequence} of derivatives, each one somewhat related to the last. Thus, we propose accelerating derivative computation by reusing information from previous, related calculations-a general strategy known as \textit{coherence}. We introduce the first instantiation of this strategy through a novel approach called the Web of Affine Spaces (WASP) Optimization. This approach provides an accurate approximation of a function's derivative object (i.e. gradient, Jacobian matrix, etc.) at the current input within a sequence. Each derivative within the sequence only requires a small number of forward passes through the function (typically two), regardless of the number of function inputs and outputs. We demonstrate the efficacy of our approach through several numerical experiments, comparing it with alternative derivative computation methods on benchmark functions. We show that our method significantly improves the performance of derivative computation on small to medium-sized functions, i.e., functions with approximately fewer than 500 combined inputs and outputs. Furthermore, we show that this method can be effectively applied in a robotics optimization context. We conclude with a discussion of the limitations and implications of our work. Open-source code, visual explanations, and videos are located at the paper website: \href{https://apollo-lab-yale.github.io/25-RSS-WASP-website/}{https://apollo-lab-yale.github.io/25-RSS-WASP-website/}.

Coherence-based Approximate Derivatives via Web of Affine Spaces Optimization

TL;DR

This work addresses the cost of computing sequences of derivatives for differentiable functions by exploiting coherence across consecutive inputs. It introduces Web of Affine Spaces (WASP) Optimization, reframing derivative computation as a constrained least-squares problem that uses a web of affine spaces anchored by past JVP observations to locate an approximate derivative efficiently; the solution is obtained in closed form via a KKT system and can be updated with minimal forward passes. An error-detection/correction mechanism maintains accuracy over long sequences by substituting ground-truth JVPs when needed and updating the web of affine spaces to re-intersect at the new estimate. Empirical results show WASP achieves faster derivative computation than finite differencing and certain auto-diff baselines on small-to-medium problems and demonstrates practical benefits in robot optimization, while acknowledging limitations in precision, step-size sensitivity, and scalability to large networks. The approach offers a pragmatic pathway to real-time derivative estimation in robotics and related domains, enabling more responsive optimization with modest computational overhead and code changes.

Abstract

Computing derivatives is a crucial subroutine in computer science and related fields as it provides a local characterization of a function's steepest directions of ascent or descent. In this work, we recognize that derivatives are often not computed in isolation; conversely, it is quite common to compute a \textit{sequence} of derivatives, each one somewhat related to the last. Thus, we propose accelerating derivative computation by reusing information from previous, related calculations-a general strategy known as \textit{coherence}. We introduce the first instantiation of this strategy through a novel approach called the Web of Affine Spaces (WASP) Optimization. This approach provides an accurate approximation of a function's derivative object (i.e. gradient, Jacobian matrix, etc.) at the current input within a sequence. Each derivative within the sequence only requires a small number of forward passes through the function (typically two), regardless of the number of function inputs and outputs. We demonstrate the efficacy of our approach through several numerical experiments, comparing it with alternative derivative computation methods on benchmark functions. We show that our method significantly improves the performance of derivative computation on small to medium-sized functions, i.e., functions with approximately fewer than 500 combined inputs and outputs. Furthermore, we show that this method can be effectively applied in a robotics optimization context. We conclude with a discussion of the limitations and implications of our work. Open-source code, visual explanations, and videos are located at the paper website: \href{https://apollo-lab-yale.github.io/25-RSS-WASP-website/}{https://apollo-lab-yale.github.io/25-RSS-WASP-website/}.

Paper Structure

This paper contains 51 sections, 9 theorems, 61 equations, 6 figures, 2 tables, 10 algorithms.

Key Result

Proposition 1

$\mathbf{P}_i\mathbf{y} \in (\mathbf{A}^{-1}\Delta\mathbf{x}_i)^\perp \ \forall \mathbf{y} \in \mathbb{R}^n$, i.e. the matrix-vector product $\mathbf{P}_i\mathbf{y}$ is orthogonal to $\mathbf{A}^{-1} \Delta \mathbf{x}_i$ for all $\mathbf{y} \in \mathbb{R}^n$

Figures (6)

  • Figure 1: In this work, we present an approach for efficiently computing a sequence of approximate derivatives by reusing information from recent calculations. Our approach first isolates an affine solution space where the true derivative must lie (purple line). Next, a closed-form optimization procedure locates the point in this space that is the closest orthogonal distance (red lines) to a "web" of affine spaces (dark blue lines) that intersects at the previous approximate derivative (orange dot). This optimal point will be the transpose of the approximate derivative matrix, $\mathbf{D}^{* \top}$ (green dot).
  • Figure 2: (a) The web of affine spaces, encoded as the columns of the $\hat{\Delta \mathbf{F}}$ matrix, start an iteration as intersecting at the previously computed derivative solution, $\mathbf{D}^{* \top}$. (b) When a ground truth directional derivative is computed ($\Delta \mathbf{f}_1$ in this case), the affine space associated with its tangent direction ($\Delta \mathbf{x}_1$ in this case) is shifted away from the other affine spaces. This affine space, illustrated as a purple line, is guaranteed to contain the transpose of the ground truth derivative at the current input. (c) The constrained optimization step locates the point on the solution space that is closest (in terms of Euclidean distance) to the web of other affine spaces. (d) This point is the transpose of the approximate derivative at the current input, and the web of affine spaces (via the $\hat{\Delta \mathbf{F}}$ matrix) is updated such that they now intersect at this new point. The space is now ready for either another iteration of the algorithm on the same input, if needed, or the next input in the sequence, $\mathbf{x}_{k+1}$.
  • Figure 3: Results for Evaluation 1, Sub-Experiment 1 (top) Sub-Experiment 2 (middle) and Sub-Experiment 3 (bottom)
  • Figure 4: Results for Evaluation 2. These results show the norm error (first row), angular error (second row), the number of function calls (third row), and runtime (fourth row) per derivative computation over a sequence of 50,000 derivatives (x-axis).
  • Figure 5: Evaluation 3 involves assessing performance in a robotic root-finding procedure, where the goal is to determine a robot pose that positions its feet and end-effector at predefined locations or orientations.
  • ...and 1 more figures

Theorems & Definitions (18)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • proof
  • ...and 8 more