A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks
Michel Barlaud, Guillaume Perez, Jean-Paul Marmorat
TL;DR
This paper tackles the computational bottleneck of projecting onto the $\\ell_{1,\\infty}$ ball, which typically costs $O(nm\\log(nm))$. It introduces a bi-level projection BP^{1,\\infty}_\\eta(Y) that decouples a global $\\ell_1$-ball step from per-column clipping, achieving a linear-time complexity of $O(nm)$ and yielding a tight norm identity ${\\|Y - BP^{1,\\infty}_\\eta(Y)\\|}_{1,\\infty} + {\\|BP^{1,\\infty}_\\eta(Y)\\|}_{1,\\infty} = {\\|Y\\|}_{1,\\infty}$. The framework is extended to bilevel $\\ell_{1,1}$ and $\\ell_{1,2}$ projections, with corresponding identities, and validated via extensive experiments showing speedups (≈2.5x) over the fastest existing method and improved sparsity and classification accuracy in supervised autoencoders on synthetic and real data (notably HIF2). The approach promises scalable structured sparsity for neural networks, with extensions to CNNs and attention mechanisms for broader impact. $O(nm)$-time projection enables practical sparsification of large neural networks while preserving performance.
Abstract
The $\ell_{1,\infty}$ norm is an efficient-structured projection, but the complexity of the best algorithm is, unfortunately, $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix $n\times m$.\\ In this paper, we propose a new bi-level projection method, for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix $n\times m$. Moreover, we provide a new $\ell_{1,\infty}$ identity with mathematical proof and experimental validation. Experiments show that our bi-level $\ell_{1,\infty}$ projection is $2.5$ times faster than the actual fastest algorithm and provides the best sparsity while keeping the same accuracy in classification applications.
