Table of Contents
Fetching ...

Sparse Deep Learning Models with the $\ell_1$ Regularization

Lixin Shen, Rui Wang, Yuesheng Xu, Mingsong Yan

TL;DR

The paper addresses how to enforce and control sparsity in deep neural networks through $\ell_1$ regularization by deriving MAP-based models with appropriate priors. It develops single- and multi-parameter ($\lambda$, $\lambda_k$) formulations, linking regularization strength to sparsity via subdifferential conditions and gradient magnitudes, and introduces iterative schemes to achieve target sparsity levels with tolerance. A proximal-gradient algorithm is provided to efficiently solve the nonconvex, non differentiable objectives, enabling practical sparsification. Numerical experiments on regression and MNIST classification demonstrate that the methods can produce sparsified networks with substantial reductions in parameter count while maintaining competitive accuracy, including layer-wise sparsity control in multi-parameter settings.

Abstract

Sparse neural networks are highly desirable in deep learning in reducing its complexity. The goal of this paper is to study how choices of regularization parameters influence the sparsity level of learned neural networks. We first derive the $\ell_1$-norm sparsity-promoting deep learning models including single and multiple regularization parameters models, from a statistical viewpoint. We then characterize the sparsity level of a regularized neural network in terms of the choice of the regularization parameters. Based on the characterizations, we develop iterative algorithms for selecting regularization parameters so that the weight parameters of the resulting deep neural network enjoy prescribed sparsity levels. Numerical experiments are presented to demonstrate the effectiveness of the proposed algorithms in choosing desirable regularization parameters and obtaining corresponding neural networks having both of predetermined sparsity levels and satisfactory approximation accuracy.

Sparse Deep Learning Models with the $\ell_1$ Regularization

TL;DR

The paper addresses how to enforce and control sparsity in deep neural networks through regularization by deriving MAP-based models with appropriate priors. It develops single- and multi-parameter (, ) formulations, linking regularization strength to sparsity via subdifferential conditions and gradient magnitudes, and introduces iterative schemes to achieve target sparsity levels with tolerance. A proximal-gradient algorithm is provided to efficiently solve the nonconvex, non differentiable objectives, enabling practical sparsification. Numerical experiments on regression and MNIST classification demonstrate that the methods can produce sparsified networks with substantial reductions in parameter count while maintaining competitive accuracy, including layer-wise sparsity control in multi-parameter settings.

Abstract

Sparse neural networks are highly desirable in deep learning in reducing its complexity. The goal of this paper is to study how choices of regularization parameters influence the sparsity level of learned neural networks. We first derive the -norm sparsity-promoting deep learning models including single and multiple regularization parameters models, from a statistical viewpoint. We then characterize the sparsity level of a regularized neural network in terms of the choice of the regularization parameters. Based on the characterizations, we develop iterative algorithms for selecting regularization parameters so that the weight parameters of the resulting deep neural network enjoy prescribed sparsity levels. Numerical experiments are presented to demonstrate the effectiveness of the proposed algorithms in choosing desirable regularization parameters and obtaining corresponding neural networks having both of predetermined sparsity levels and satisfactory approximation accuracy.
Paper Structure (9 sections, 9 theorems, 92 equations, 7 tables, 3 algorithms)

This paper contains 9 sections, 9 theorems, 92 equations, 7 tables, 3 algorithms.

Key Result

Proposition 1

Suppose that $\left\{\left(x^i,y^i\right)\in\mathbb{R}^p\times\mathbb{R}^q: i\in\mathbb{N}_N\right\}$ is a given dataset, and the labels $y^i$ are the observed values of the random variables $\tilde{y}^i$ defined by equation epsilon j tilde with $\epsilon_j^i$, $i\in\mathbb{N}_N$, $j\in\mathbb{N}_q$ with $\lambda:=2v^2/s$.

Theorems & Definitions (17)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Theorem 4
  • proof
  • Theorem 5
  • proof
  • ...and 7 more