Table of Contents
Fetching ...

A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

Michel Barlaud, Guillaume Perez, Jean-Paul Marmorat

TL;DR

This paper tackles the computational bottleneck of projecting onto the $\\ell_{1,\\infty}$ ball, which typically costs $O(nm\\log(nm))$. It introduces a bi-level projection BP^{1,\\infty}_\\eta(Y) that decouples a global $\\ell_1$-ball step from per-column clipping, achieving a linear-time complexity of $O(nm)$ and yielding a tight norm identity ${\\|Y - BP^{1,\\infty}_\\eta(Y)\\|}_{1,\\infty} + {\\|BP^{1,\\infty}_\\eta(Y)\\|}_{1,\\infty} = {\\|Y\\|}_{1,\\infty}$. The framework is extended to bilevel $\\ell_{1,1}$ and $\\ell_{1,2}$ projections, with corresponding identities, and validated via extensive experiments showing speedups (≈2.5x) over the fastest existing method and improved sparsity and classification accuracy in supervised autoencoders on synthetic and real data (notably HIF2). The approach promises scalable structured sparsity for neural networks, with extensions to CNNs and attention mechanisms for broader impact. $O(nm)$-time projection enables practical sparsification of large neural networks while preserving performance.

Abstract

The $\ell_{1,\infty}$ norm is an efficient-structured projection, but the complexity of the best algorithm is, unfortunately, $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix $n\times m$.\\ In this paper, we propose a new bi-level projection method, for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix $n\times m$. Moreover, we provide a new $\ell_{1,\infty}$ identity with mathematical proof and experimental validation. Experiments show that our bi-level $\ell_{1,\infty}$ projection is $2.5$ times faster than the actual fastest algorithm and provides the best sparsity while keeping the same accuracy in classification applications.

A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

TL;DR

This paper tackles the computational bottleneck of projecting onto the ball, which typically costs . It introduces a bi-level projection BP^{1,\\infty}_\\eta(Y) that decouples a global -ball step from per-column clipping, achieving a linear-time complexity of and yielding a tight norm identity . The framework is extended to bilevel and projections, with corresponding identities, and validated via extensive experiments showing speedups (≈2.5x) over the fastest existing method and improved sparsity and classification accuracy in supervised autoencoders on synthetic and real data (notably HIF2). The approach promises scalable structured sparsity for neural networks, with extensions to CNNs and attention mechanisms for broader impact. -time projection enables practical sparsification of large neural networks while preserving performance.

Abstract

The norm is an efficient-structured projection, but the complexity of the best algorithm is, unfortunately, for a matrix .\\ In this paper, we propose a new bi-level projection method, for which we show that the time complexity for the norm is only for a matrix . Moreover, we provide a new identity with mathematical proof and experimental validation. Experiments show that our bi-level projection is times faster than the actual fastest algorithm and provides the best sparsity while keeping the same accuracy in classification applications.
Paper Structure (17 sections, 4 theorems, 30 equations, 9 figures, 4 tables, 3 algorithms)

This paper contains 17 sections, 4 theorems, 30 equations, 9 figures, 4 tables, 3 algorithms.

Key Result

Proposition 3.3

In the case of the $\ell_{1,\infty}$ norm, bilevel projected data and residual are linked by the following relation:

Figures (9)

  • Figure 1: Processing time using C++ as a function of the number of features $n=1000$ samples (top) and Samples $m=1000$ features (bottom): bi-level projection method versus Chu et al. method.
  • Figure 2: Processing time using C++ as a function of the number of features (Top), and samples (bottom)
  • Figure 3: Identity norm comparison Top: the Bilevel $\ell_{1,\infty}$ versus classical, Middle: Bilevel $\ell_{1,1}$, bottom: Bilevel $\ell_{1,2}$ projection.
  • Figure 4: Bilevel $\ell_{1,\infty}$ projection and usual $\ell_{1,\infty}$ projection with $\ell_{2,2}$ norm.
  • Figure 5: 64 informative features Sparsity Top: the Bilevel $\ell_{1,\infty}$, Middle: Bilevel $\ell_{1,1}$, bottom: Bilevel $\ell_{1,2}$ projection
  • ...and 4 more figures

Theorems & Definitions (9)

  • Remark 3.1
  • Remark 3.2
  • Proposition 3.3
  • Remark 3.4
  • Proposition 3.5
  • Remark 3.6
  • Proposition 4.1
  • Proposition 4.2
  • Remark 5.1