Table of Contents
Fetching ...

Yau's Affine Normal Descent: Algorithmic Framework and Convergence Analysis

Yi-Shuai Niu, Artan Sheshmani, Shing-Tung Yau

Abstract

We propose Yau's Affine Normal Descent (YAND), a geometric framework for smooth unconstrained optimization in which search directions are defined by the equi-affine normal of level-set hypersurfaces. The resulting directions are invariant under volume-preserving affine transformations and intrinsically adapt to anisotropic curvature. Using the analytic representation of the affine normal from affine differential geometry, we establish its equivalence with the classical slice-centroid construction under convexity. For strictly convex quadratic objectives, affine-normal directions are collinear with Newton directions, implying one-step convergence under exact line search. For general smooth (possibly nonconvex) objectives, we characterize precisely when affine-normal directions yield strict descent and develop a line-search-based YAND. We establish global convergence under standard smoothness assumptions, linear convergence under strong convexity and Polyak-Lojasiewicz conditions, and quadratic local convergence near nondegenerate minimizers. We further show that affine-normal directions are robust under affine scalings, remaining insensitive to arbitrarily ill-conditioned transformations. Numerical experiments illustrate the geometric behavior of the method and its robustness under strong anisotropic scaling.

Yau's Affine Normal Descent: Algorithmic Framework and Convergence Analysis

Abstract

We propose Yau's Affine Normal Descent (YAND), a geometric framework for smooth unconstrained optimization in which search directions are defined by the equi-affine normal of level-set hypersurfaces. The resulting directions are invariant under volume-preserving affine transformations and intrinsically adapt to anisotropic curvature. Using the analytic representation of the affine normal from affine differential geometry, we establish its equivalence with the classical slice-centroid construction under convexity. For strictly convex quadratic objectives, affine-normal directions are collinear with Newton directions, implying one-step convergence under exact line search. For general smooth (possibly nonconvex) objectives, we characterize precisely when affine-normal directions yield strict descent and develop a line-search-based YAND. We establish global convergence under standard smoothness assumptions, linear convergence under strong convexity and Polyak-Lojasiewicz conditions, and quadratic local convergence near nondegenerate minimizers. We further show that affine-normal directions are robust under affine scalings, remaining insensitive to arbitrarily ill-conditioned transformations. Numerical experiments illustrate the geometric behavior of the method and its robustness under strong anisotropic scaling.

Paper Structure

This paper contains 91 sections, 30 theorems, 279 equations, 13 figures, 3 tables.

Key Result

Theorem 2.2

Let $M=\{x: f(x)=f(z)\}$ be the level hypersurface at $z$, and assume that $M$ is locally strictly convex at $z$ (equivalently, the tangent-tangent Hessian of $f$ is positive definite at $z$). Let $g(C)$ be the centroid of the slice where $P(x)=\nabla f(z)\cdot(x-z)$ and $C<0$ corresponds to the interior. Then the inward one-sided limit exists and agrees with the analytical affine-normal directi

Figures (13)

  • Figure 1: Geometric comparison between the Euclidean and affine normals
  • Figure 2: Normal–aligned frame at $x_k$ illustrating three typical constructions of $d_k$. The analytic affine normal $d_{\mathrm{AN}}(x_k)$ is represented in the frame $\{e_1,\dots,e_n,e_{n+1}\}$ with its $(n+1)$-st component normalized to $-1$. Case 1: the affine normal is already a descent direction ($\langle\nabla f(x_k), d_{\mathrm{AN}}(x_k)\rangle<0$), hence $d_k=d_{\mathrm{AN}}(x_k)$. Case 2: the affine normal points uphill ($\langle\nabla f(x_k), d_{\mathrm{AN}}(x_k)\rangle>0$), so we flip the sign and set $d_k=-d_{\mathrm{AN}}(x_k)$. Case 3: the affine normal is orthogonal to the gradient ($\langle\nabla f(x_k), d_{\mathrm{AN}}(x_k)\rangle=0$); in this degenerate case we revert to the steepest–descent direction $d_k=-\nabla f(x_k)/\|\nabla f(x_k)\|$.
  • Figure 3: YAND on the well-conditioned quadratic \ref{['eq:quad-well']} with three different line-search strategies. Each panel shows (from left to right) the YAND trajectory on level sets, the function value $f(x_k)-f^\star$ (log scale), and the gradient norm $\|\nabla f(x_k)\|_2$ (log scale).
  • Figure 4: Optimization trajectories in the original $x$-coordinates for $f_\gamma(x)=\tfrac{1}{2}(x_1^2+\gamma^2 x_2^2)$ with $\gamma=1,10^2,10^4$. As $\gamma$ increases, the level sets become increasingly elongated. YAND-Exact and Newton remain essentially one-step methods, while the behavior of gradient descent depends more strongly on the step-size rule. In particular, GD-Fixed becomes substantially slower as the anisotropy increases, whereas GD-Exact remains convergent on this diagonal quadratic.
  • Figure 5: YAND trajectories after normalization $y=B_\gamma x$ for $\gamma=1,10^2,10^4$. After mapping to the intrinsic coordinates of $\phi(y)=\tfrac{1}{2}\|y\|^2$, the trajectories collapse onto nearly identical paths, illustrating the affine invariance predicted by the theory.
  • ...and 8 more figures

Theorems & Definitions (76)

  • Remark 2.1: Moment viewpoint
  • Theorem 2.2: Consistency of slice-centroid and analytical affine normal under convexity
  • proof
  • Lemma 2.3: Fourth-moment tensor on the sphere
  • Theorem 3.1: Affine normal coincides with the Newton direction on strictly convex quadratics
  • Corollary 3.2: One-step convergence with exact line search
  • proof
  • Theorem 4.1: Strict descent at elliptic points
  • proof
  • Remark 4.2: Geometric meaning
  • ...and 66 more