The Non-Linearity Perturbation Threshold: Width Scaling and Landscape Bifurcations in Deep Learning

Michael Alexander

The Non-Linearity Perturbation Threshold: Width Scaling and Landscape Bifurcations in Deep Learning

Michael Alexander

Abstract

We study how the optimization landscape of a neural network deforms as a non-linear activation is introduced through a smooth homotopy. Working first in an abstract local setting - a smooth one-parameter family of objective functions together with a critical branch that loses non-degeneracy through a simple Hessian kernel - we show via Lyapunov-Schmidt reduction that the local transition is controlled by the classical codimension-one normal forms (transcritical or pitchfork) and that the associated topology change is governed by Morse-theoretic handle attachment. We then move beyond the abstract framework and verify these assumptions for a concrete two-layer architecture. We prove that bilinear overparameterization creates an (m-1)d-dimensional Hessian kernel at the linear endpoint, which Tikhonov regularization lifts to a floor alpha > 0; the activation homotopy softens this floor, yielding an explicit bifurcation point lambda* approximately equal to alpha/|lambda_1'(0)|. We derive the eigenvalue-softening rate as a functional of activation derivatives and data moments, and prove that the near-pitchfork normal form (|g_aa/g_aaa| much less than 1) is a structural consequence of sigma''(0)=0 for tanh-like activations. The bifurcation point scales as lambda* proportional to alpha m with network width, connecting the framework to the NTK regime: at large m the landscape reorganization is pushed past lambda=1 and the linearized picture prevails. The foundational algebraic theorems have been formally verified in the Lean 4 theorem prover, and theoretical predictions computed for widths m in {3, 5, 10, 20, 50, 100} exhibit quantitative agreement with the abstract framework.

The Non-Linearity Perturbation Threshold: Width Scaling and Landscape Bifurcations in Deep Learning

Abstract

The Non-Linearity Perturbation Threshold: Width Scaling and Landscape Bifurcations in Deep Learning

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (35)