Table of Contents
Fetching ...

Representation and Regression Problems in Neural Networks: Relaxation, Generalization, and Numerics

Kang Liu, Enrique Zuazua

TL;DR

This work addresses three non-convex sparse-learning problems for shallow neural networks by employing mean-field relaxations to convex measure-valued formulations. A Representer Theorem-based analysis shows there is no relaxation gap for $P\ge N$, enabling a reduction to $P=N$ and characterizing solutions as finite empirical measures with at most $N$ atoms. The authors derive a generalization bound based on the Kantorovich–Rubinstein distance, provide practical guidance for hyperparameter selection, and develop discretization and sparsification algorithms that connect convex relaxations to primal sparse representations. Numerical experiments illustrate the trade-offs between exact/approximate representation and regression, including a sparsification pipeline that yields strong performance with far fewer active neurons, and a double-descent interpretation in random-feature regimes. The framework unifies theory and computation for shallow NNs, offering scalable paths from infinite-dimensional relaxations to sparse, high-performing models with principled hyperparameter choices.

Abstract

In this work, we address three non-convex optimization problems associated with the training of shallow neural networks (NNs) for exact and approximate representation, as well as for regression tasks. Through a mean-field approach, we convexify these problems and, applying a representer theorem, prove the absence of relaxation gaps. We establish generalization bounds for the resulting NN solutions, assessing their predictive performance on test datasets and, analyzing the impact of key hyperparameters on these bounds, propose optimal choices. On the computational side, we examine the discretization of the convexified problems and derive convergence rates. For low-dimensional datasets, these discretized problems are efficiently solvable using the simplex method. For high-dimensional datasets, we propose a sparsification algorithm that, combined with gradient descent for over-parameterized shallow NNs, yields effective solutions to the primal problems.

Representation and Regression Problems in Neural Networks: Relaxation, Generalization, and Numerics

TL;DR

This work addresses three non-convex sparse-learning problems for shallow neural networks by employing mean-field relaxations to convex measure-valued formulations. A Representer Theorem-based analysis shows there is no relaxation gap for , enabling a reduction to and characterizing solutions as finite empirical measures with at most atoms. The authors derive a generalization bound based on the Kantorovich–Rubinstein distance, provide practical guidance for hyperparameter selection, and develop discretization and sparsification algorithms that connect convex relaxations to primal sparse representations. Numerical experiments illustrate the trade-offs between exact/approximate representation and regression, including a sparsification pipeline that yields strong performance with far fewer active neurons, and a double-descent interpretation in random-feature regimes. The framework unifies theory and computation for shallow NNs, offering scalable paths from infinite-dimensional relaxations to sparse, high-performing models with principled hyperparameter choices.

Abstract

In this work, we address three non-convex optimization problems associated with the training of shallow neural networks (NNs) for exact and approximate representation, as well as for regression tasks. Through a mean-field approach, we convexify these problems and, applying a representer theorem, prove the absence of relaxation gaps. We establish generalization bounds for the resulting NN solutions, assessing their predictive performance on test datasets and, analyzing the impact of key hyperparameters on these bounds, propose optimal choices. On the computational side, we examine the discretization of the convexified problems and derive convergence rates. For low-dimensional datasets, these discretized problems are efficiently solvable using the simplex method. For high-dimensional datasets, we propose a sparsification algorithm that, combined with gradient descent for over-parameterized shallow NNs, yields effective solutions to the primal problems.

Paper Structure

This paper contains 27 sections, 18 theorems, 150 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 1.1

Assume that $\mathcal{B}\subset \mathbb{R}^q$ is compact and $L_i \colon \mathcal{B} \to \mathbb{R}$ is continuous for $i=1,\ldots,N$. Consider the following optimization problem: where $I_i$ is a compact interval or a singleton in $\mathbb{R}$ for $i=1,\ldots,N$. Assume that the feasible set of problem pb:total_variation is non-empty. Then, its solution set $Spb:total_variation$ is non-empty, co

Figures (6)

  • Figure 1: Qualitative curve for optimal choice of $\epsilon$ with respect to $d_{KR}(X,X')$.
  • Figure 2: Qualitative curves of $\mathcal{L}$ and $\mathcal{U}$ in Proposition \ref{['prop:lambda']} and their minimizers.
  • Figure 3: Datasets with different standard deviations of noise.
  • Figure 4: Testing accuracy by solutions of problem \ref{['intro_pb:NN_epsilon']} (left) and problem \ref{['intro_pb:NN_reg']} (right) with different hyperparameters $\epsilon$ and $\lambda$ in three noise scenarios.
  • Figure 5: Numerical results of training shallow NNs on a subset of the MNIST dataset.
  • ...and 1 more figures

Theorems & Definitions (66)

  • Theorem 1.1: Fisher-Jerome 75
  • Example 2.1
  • Remark 2.2
  • Remark 2.3
  • Theorem 2.4
  • proof
  • Remark 2.5
  • Corollary 2.6
  • proof
  • Remark 2.7
  • ...and 56 more