A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

Michel Barlaud; Guillaume Perez; Jean-Paul Marmorat

A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

Michel Barlaud, Guillaume Perez, Jean-Paul Marmorat

TL;DR

This paper tackles the computational bottleneck of projecting onto the $\\ell_{1,\\infty}$ ball, which typically costs $O(nm\\log(nm))$. It introduces a bi-level projection BP^{1,\\infty}_\\eta(Y) that decouples a global $\\ell_1$-ball step from per-column clipping, achieving a linear-time complexity of $O(nm)$ and yielding a tight norm identity ${\\|Y - BP^{1,\\infty}_\\eta(Y)\\|}_{1,\\infty} + {\\|BP^{1,\\infty}_\\eta(Y)\\|}_{1,\\infty} = {\\|Y\\|}_{1,\\infty}$. The framework is extended to bilevel $\\ell_{1,1}$ and $\\ell_{1,2}$ projections, with corresponding identities, and validated via extensive experiments showing speedups (≈2.5x) over the fastest existing method and improved sparsity and classification accuracy in supervised autoencoders on synthetic and real data (notably HIF2). The approach promises scalable structured sparsity for neural networks, with extensions to CNNs and attention mechanisms for broader impact. $O(nm)$-time projection enables practical sparsification of large neural networks while preserving performance.

Abstract

The $\ell_{1,\infty}$ norm is an efficient-structured projection, but the complexity of the best algorithm is, unfortunately, $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix $n\times m$.\\ In this paper, we propose a new bi-level projection method, for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix $n\times m$. Moreover, we provide a new $\ell_{1,\infty}$ identity with mathematical proof and experimental validation. Experiments show that our bi-level $\ell_{1,\infty}$ projection is $2.5$ times faster than the actual fastest algorithm and provides the best sparsity while keeping the same accuracy in classification applications.

A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

TL;DR

This paper tackles the computational bottleneck of projecting onto the

ball, which typically costs

. It introduces a bi-level projection BP^{1,\\infty}_\\eta(Y) that decouples a global

-ball step from per-column clipping, achieving a linear-time complexity of

and yielding a tight norm identity

. The framework is extended to bilevel

and

projections, with corresponding identities, and validated via extensive experiments showing speedups (≈2.5x) over the fastest existing method and improved sparsity and classification accuracy in supervised autoencoders on synthetic and real data (notably HIF2). The approach promises scalable structured sparsity for neural networks, with extensions to CNNs and attention mechanisms for broader impact.

-time projection enables practical sparsification of large neural networks while preserving performance.

Abstract

The

norm is an efficient-structured projection, but the complexity of the best algorithm is, unfortunately,

for a matrix

.\\ In this paper, we propose a new bi-level projection method, for which we show that the time complexity for the

norm is only

for a matrix

. Moreover, we provide a new

identity with mathematical proof and experimental validation. Experiments show that our bi-level

projection is

times faster than the actual fastest algorithm and provides the best sparsity while keeping the same accuracy in classification applications.

Paper Structure (17 sections, 4 theorems, 30 equations, 9 figures, 4 tables, 3 algorithms)

This paper contains 17 sections, 4 theorems, 30 equations, 9 figures, 4 tables, 3 algorithms.

Introduction
State of the art of the $\ell_{1,\infty}$ ball projection
A new Bi-level $\ell_{1,\infty}$ structured projection
A new bi-level projection
The $\ell_{1,\infty}$ identity
Convergence and Computational complexity
Extension to other sparse structured projections
Bilevel $\ell_{1,1}$ projection
Bilevel $\ell_{1,2}$ projection.
Experimental results
Benchmark times using PyTorch C++ extension using a MacBook Laptop with an i9 processor; Comparison with the best actual projection method
Benchmark of Identity Proposition
Experimental results on classification and feature selection using a supervised autoencoder neural network
Supervised Autoencoder (SAE) framework
Experimental accuracy results on autoencoder neural networks
...and 2 more sections

Key Result

Proposition 3.3

In the case of the $\ell_{1,\infty}$ norm, bilevel projected data and residual are linked by the following relation:

Figures (9)

Figure 1: Processing time using C++ as a function of the number of features $n=1000$ samples (top) and Samples $m=1000$ features (bottom): bi-level projection method versus Chu et al. method.
Figure 2: Processing time using C++ as a function of the number of features (Top), and samples (bottom)
Figure 3: Identity norm comparison Top: the Bilevel $\ell_{1,\infty}$ versus classical, Middle: Bilevel $\ell_{1,1}$, bottom: Bilevel $\ell_{1,2}$ projection.
Figure 4: Bilevel $\ell_{1,\infty}$ projection and usual $\ell_{1,\infty}$ projection with $\ell_{2,2}$ norm.
Figure 5: 64 informative features Sparsity Top: the Bilevel $\ell_{1,\infty}$, Middle: Bilevel $\ell_{1,1}$, bottom: Bilevel $\ell_{1,2}$ projection
...and 4 more figures

Theorems & Definitions (9)

Remark 3.1
Remark 3.2
Proposition 3.3
Remark 3.4
Proposition 3.5
Remark 3.6
Proposition 4.1
Proposition 4.2
Remark 5.1

A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

TL;DR

Abstract

A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (9)