The Effects of Multi-Task Learning on ReLU Neural Network Functions
Julia Nakhleh, Joseph Shenouda, Robert D. Nowak
TL;DR
The paper investigates weight-decay trained, shallow, multi-output ReLU networks for multi-task interpolation and uncovers a sharp dichotomy between single-task and multi-task solutions. In the univariate setting, it proves that multi-task interpolation is almost surely unique and coincides with the connect-the-dots linear spline, which is the minimum-norm interpolant in the Sobolev RKHS $H^1([x_1,x_N])$, while single-task solutions generally reside in the non-Hilbert $ ext{BV}^2$ space. For many tasks in the multivariate setting, the authors show the learned solution is well-approximated by RKHS ridge regression over a fixed kernel determined by the optimal neurons, with per-task penalties converging to a common scale as $T$ grows; this reveals a fundamental RKHS kernel interpretation of multi-task learning with ReLU networks and contrasts with the $ ext{L}^1$-like behavior seen in single-task cases. Together, these results establish a concrete bridge between shallow ReLU networks under weight decay and kernel methods, offering insights into generalization, robustness, and the potential for kernel-based analyses in multi-task neural settings.
Abstract
This paper studies the properties of solutions to multi-task shallow ReLU neural network learning problems, wherein the network is trained to fit a dataset with minimal sum of squared weights. Remarkably, the solutions learned for each individual task resemble those obtained by solving a kernel regression problem, revealing a novel connection between neural networks and kernel methods. It is known that single-task neural network learning problems are equivalent to a minimum norm interpolation problem in a non-Hilbertian Banach space, and that the solutions of such problems are generally non-unique. In contrast, we prove that the solutions to univariate-input, multi-task neural network interpolation problems are almost always unique, and coincide with the solution to a minimum-norm interpolation problem in a Sobolev (Reproducing Kernel) Hilbert Space. We also demonstrate a similar phenomenon in the multivariate-input case; specifically, we show that neural network learning problems with large numbers of tasks are approximately equivalent to an $\ell^2$ (Hilbert space) minimization problem over a fixed kernel determined by the optimal neurons.
