Table of Contents
Fetching ...

Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks

Hang-Cheng Dong, Pengcheng Cheng

Abstract

Overparameterized shallow neural networks admit substantial parameter redundancy: distinct parameter vectors may represent the same predictor due to hidden-unit permutations, rescalings, and related symmetries. As a result, geometric quantities computed directly in the ambient Euclidean parameter space can reflect artifacts of representation rather than intrinsic properties of the predictor. In this paper, we develop a differential-geometric framework for analyzing simple shallow networks through the quotient space obtained by modding out parameter symmetries on a regular set. We first characterize the symmetry and quotient structure of regular shallow-network parameters and show that the finite-sample realization map induces a natural metric on the quotient manifold. This leads to an effective notion of curvature that removes degeneracy along symmetry orbits and yields a symmetry-reduced Hessian capturing intrinsic local geometry. We then study gradient flows on the quotient and show that only the horizontal component of parameter motion contributes to first-order predictor evolution, while the vertical component corresponds purely to gauge variation. Finally, we formulate an implicit-bias viewpoint at the quotient level, arguing that meaningful complexity should be assigned to predictor classes rather than to individual parameter representatives. Our experiments confirm that ambient flatness is representation-dependent, that local dynamics are better organized by quotient-level curvature summaries, and that in underdetermined regimes, implicit bias is most naturally described in quotient coordinates.

Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks

Abstract

Overparameterized shallow neural networks admit substantial parameter redundancy: distinct parameter vectors may represent the same predictor due to hidden-unit permutations, rescalings, and related symmetries. As a result, geometric quantities computed directly in the ambient Euclidean parameter space can reflect artifacts of representation rather than intrinsic properties of the predictor. In this paper, we develop a differential-geometric framework for analyzing simple shallow networks through the quotient space obtained by modding out parameter symmetries on a regular set. We first characterize the symmetry and quotient structure of regular shallow-network parameters and show that the finite-sample realization map induces a natural metric on the quotient manifold. This leads to an effective notion of curvature that removes degeneracy along symmetry orbits and yields a symmetry-reduced Hessian capturing intrinsic local geometry. We then study gradient flows on the quotient and show that only the horizontal component of parameter motion contributes to first-order predictor evolution, while the vertical component corresponds purely to gauge variation. Finally, we formulate an implicit-bias viewpoint at the quotient level, arguing that meaningful complexity should be assigned to predictor classes rather than to individual parameter representatives. Our experiments confirm that ambient flatness is representation-dependent, that local dynamics are better organized by quotient-level curvature summaries, and that in underdetermined regimes, implicit bias is most naturally described in quotient coordinates.
Paper Structure (38 sections, 12 theorems, 406 equations, 7 figures)

This paper contains 38 sections, 12 theorems, 406 equations, 7 figures.

Key Result

Proposition 2.1

(Orbit directions and exact infinitesimal redundancy) Let be the realization map of the shallow network and let $G=S_m\ltimes (\mathbb{R}_{>0})^m$ act on $\Theta$ by permutation and neuronwise positive rescaling. If $\theta\in \Theta_{\mathrm{reg}}$, then Consequently, the quotient tangent space $T_{[\theta]}(\Theta_{\mathrm{reg}}/G)$ is canonically identified with the space of first-order func

Figures (7)

  • Figure 1: Euclidean Hessian spectra across orbit-equivalent representatives (left) and projected Hessian spectra after removing scaling-orbit directions (right). The variation in the Euclidean spectrum reflects representation dependence, while the projected spectrum is invariant.
  • Figure 2: Parameter-space Hessian spectra vary across orbit-equivalent representatives (left), whereas $Q$-space Hessian spectra are numerically identical (right), confirming the intrinsic nature of quotient-level curvature.
  • Figure 3: Zoomed view of the small-eigenvalue region: Euclidean Hessian (left) changes under rescaling, while the projected proxy (right) remains stable, demonstrating that apparent flatness stems from symmetry-induced redundancy.
  • Figure 4: Pooled scatter plots of short-run empirical log-loss decay rates versus various curvature descriptors. Quotient-level quantities (e.g., $Q$-space trace and Frobenius norm) exhibit stronger correlation with local dynamics than ambient parameter-space condition numbers.
  • Figure 5: Gauge-dependent parameter complexity varies within a single symmetry orbit (left panel), while quotient-level complexity remains invariant (right panel), illustrating that meaningful complexity resides in the quotient object $Q$.
  • ...and 2 more figures

Theorems & Definitions (13)

  • Proposition 2.1
  • Theorem 2.1
  • Proposition 2.2
  • Remark 2.2
  • Theorem 3.1
  • Proposition 3.1
  • Corollary 3.1
  • Proposition 4.1: Function evolution is determined by the horizontal component
  • Theorem 4.1: Horizontal lift of quotient gradient flow
  • Theorem 4.2: Local convergence controlled by effective curvature
  • ...and 3 more