Table of Contents
Fetching ...

Explicit Group Sparse Projection with Applications to Deep Learning and NMF

Riyasat Ohib, Nicolas Gillis, Niccolò Dalmasso, Sameena Shah, Vamsi K. Potluru, Sergey Plis

TL;DR

This work addresses the challenge of enforcing a controllable, average sparsity across a group of vectors. It introduces grouped sparse projection (GSP), which uses a single sparsity parameter $s$ and a dual optimization framework to project a set of nonnegative, unit-norm vectors toward high-Hoyer sparsity while preserving alignment with the inputs; a weighted variant WGSP extends this to weighted sparsity. The authors prove uniqueness properties for the dual and primal solutions, derive a Newton-based algorithm with linear-time complexity, and demonstrate strong empirical performance in sparse NMF and neural network pruning, including single-shot pruning that bypasses reguarization-based sparsity induction. The results show that GSP/ WGSP can achieve competitive or superior sparsity-accuracy trade-offs on CIFAR-10 and ImageNet while enabling efficient, scalable sparse representations. Overall, the approach provides a practical, theoretically-grounded tool for structured sparsity in both supervised and unsupervised learning contexts.

Abstract

We design a new sparse projection method for a set of vectors that guarantees a desired average sparsity level measured leveraging the popular Hoyer measure (an affine function of the ratio of the $\ell_1$ and $\ell_2$ norms). Existing approaches either project each vector individually or require the use of a regularization parameter which implicitly maps to the average $\ell_0$-measure of sparsity. Instead, in our approach we set the sparsity level for the whole set explicitly and simultaneously project a group of vectors with the sparsity level of each vector tuned automatically. We show that the computational complexity of our projection operator is linear in the size of the problem. Additionally, we propose a generalization of this projection by replacing the $\ell_1$ norm by its weighted version. We showcase the efficacy of our approach in both supervised and unsupervised learning tasks on image datasets including CIFAR10 and ImageNet. In deep neural network pruning, the sparse models produced by our method on ResNet50 have significantly higher accuracies at corresponding sparsity values compared to existing competitors. In nonnegative matrix factorization, our approach yields competitive reconstruction errors against state-of-the-art algorithms.

Explicit Group Sparse Projection with Applications to Deep Learning and NMF

TL;DR

This work addresses the challenge of enforcing a controllable, average sparsity across a group of vectors. It introduces grouped sparse projection (GSP), which uses a single sparsity parameter and a dual optimization framework to project a set of nonnegative, unit-norm vectors toward high-Hoyer sparsity while preserving alignment with the inputs; a weighted variant WGSP extends this to weighted sparsity. The authors prove uniqueness properties for the dual and primal solutions, derive a Newton-based algorithm with linear-time complexity, and demonstrate strong empirical performance in sparse NMF and neural network pruning, including single-shot pruning that bypasses reguarization-based sparsity induction. The results show that GSP/ WGSP can achieve competitive or superior sparsity-accuracy trade-offs on CIFAR-10 and ImageNet while enabling efficient, scalable sparse representations. Overall, the approach provides a practical, theoretically-grounded tool for structured sparsity in both supervised and unsupervised learning contexts.

Abstract

We design a new sparse projection method for a set of vectors that guarantees a desired average sparsity level measured leveraging the popular Hoyer measure (an affine function of the ratio of the and norms). Existing approaches either project each vector individually or require the use of a regularization parameter which implicitly maps to the average -measure of sparsity. Instead, in our approach we set the sparsity level for the whole set explicitly and simultaneously project a group of vectors with the sparsity level of each vector tuned automatically. We show that the computational complexity of our projection operator is linear in the size of the problem. Additionally, we propose a generalization of this projection by replacing the norm by its weighted version. We showcase the efficacy of our approach in both supervised and unsupervised learning tasks on image datasets including CIFAR10 and ImageNet. In deep neural network pruning, the sparse models produced by our method on ResNet50 have significantly higher accuracies at corresponding sparsity values compared to existing competitors. In nonnegative matrix factorization, our approach yields competitive reconstruction errors against state-of-the-art algorithms.

Paper Structure

This paper contains 42 sections, 4 theorems, 24 equations, 5 figures, 3 tables.

Key Result

Theorem 3.2

The function $g(\mu)$ is strictly decreasing for $0 < \mu < \tilde{\mu}$. Hence, it is not discontinuous around $g(\mu) =0$ and attains a unique root $\mu^*$.

Figures (5)

  • Figure 1: Comparison of NMF and sparse NMF algorithms. On the left: Average relative error $100 \frac{\lVert\mathrm{Y}-\mathrm{X}\mathrm{H}\rVert_F - e_{\min}}{\lVert\mathrm{Y}\rVert_F}$ obtained with different NMF algorithms over 50 synthetic data sets. On the right: Average relative error in percent to which the lowest error $e_{\min}$ obtained among all algorithms and all initializations is subtracted (hence the error should go to zero for the best algorithm), that is, $100 \frac{\lVert\mathrm{Y}-\mathrm{X}\mathrm{H}\rVert_F - e_{\min}}{\lVert\mathrm{\mathrm{Y}}\rVert_F}$, over 10 random initializations on the CBCL data set with $r=49$.
  • Figure 2: Accuracy change relative the dense model at different levels of sparsity for multiple state-of-the-art network pruning techniques applied to ResNet50 on the ImageNet dataset. Values towards top-right are better. Notably, GSP provides superior sparsity vs accuracy trade-off.
  • Figure 3: Basis elements obtained with sparse NMF (left) and weighted sparse NMF (right).
  • Figure 4: Average squared error obtained with sparse NMF (left) and WSNMF (right) --the darker, the higher the error.
  • Figure 5: Basis elements obtained with NeNMF (top left), PSNMF with sparsity 0.85 (top right), $\ell_1$ A-HALS with sparsity 0.85 (bottom left), cPSNF with sparsity 0.85 (bottom right).

Theorems & Definitions (11)

  • Remark 3.1: Abuse of terminology
  • Theorem 3.2: Uniqueness of $\mu^*$
  • proof : Proof sketch
  • Corollary 3.3: Uniqueness of projection $x^*$
  • proof : Proof sketch
  • proof
  • Lemma C.1
  • proof
  • proof : Proof of Theorem 1
  • Corollary C.2
  • ...and 1 more