Implicit Bias of Mirror Flow for Shallow Neural Networks in Univariate Regression
Shuang Liang, Guido Montúfar
TL;DR
This work analyzes the implicit bias of mirror flow for shallow, infinitely wide neural networks in univariate regression, showing lazy training and, in the infinite-width limit, equivalence to gradient flow for unscaled potentials. It provides a function-space variational description of the bias for ReLU activations and demonstrates that scaled potentials yield lazy training without kernel-regime behavior, producing biases not generally expressible as RKHS norms. The results clarify how initialization and the geometry of the training potential jointly regulate curvature penalties and the resulting learned function, with practical implications for regularization via geometry. The methodology combines linearization arguments, a minimal representation-cost framework, and a precise translation from parameter space to function space, offering a rigorous lens on when and how mirror-based optimization shapes learned regressors.
Abstract
We examine the implicit bias of mirror flow in univariate least squares error regression with wide and shallow neural networks. For a broad class of potential functions, we show that mirror flow exhibits lazy training and has the same implicit bias as ordinary gradient flow when the network width tends to infinity. For ReLU networks, we characterize this bias through a variational problem in function space. Our analysis includes prior results for ordinary gradient flow as a special case and lifts limitations which required either an intractable adjustment of the training data or networks with skip connections. We further introduce scaled potentials and show that for these, mirror flow still exhibits lazy training but is not in the kernel regime. For networks with absolute value activations, we show that mirror flow with scaled potentials induces a rich class of biases, which generally cannot be captured by an RKHS norm. A takeaway is that whereas the parameter initialization determines how strongly the curvature of the learned function is penalized at different locations of the input space, the scaled potential determines how the different magnitudes of the curvature are penalized.
