Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks
Guillaume Perez, Michel Barlaud
TL;DR
The paper tackles the high computational cost of projecting onto the structured $\ell_{1,\infty}$ ball, which impedes scalable sparsity in neural networks. It introduces bi-level and multi-level projection frameworks that decompose the projection into independent, parallelizable steps, achieving $O(nm)$ time for matrices and $O(n+m)$ under full parallelism, with exponential speedups when extended to tensors. The approach extends to related norms ($\ell_{1,1}$, $\ell_{1,2}$) and provides tri- and multi-level tensor projections, along with a comprehensive implementation and experimental validation on synthetic and biomedical data, including supervised autoencoders. Experimental results show the bi-level method is at least 2.5x faster than the current fastest algorithms while maintaining accuracy and improving sparsity, highlighting its practical impact for efficient structured sparsity in large-scale neural networks.
Abstract
The $\ell_{1,\infty}$ norm is an efficient structured projection but the complexity of the best algorithm is unfortunately $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}^{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power. We generalize our method to tensors and we propose a new multi-level projection, having an induced decomposition that yields a linear parallel speedup up to an exponential speedup factor, resulting in a time complexity lower-bounded by the sum of the dimensions, instead of the product of the dimensions. we provide a large base of implementation of our framework for bi-level and tri-level (matrices and tensors) for various norms and provides also the parallel implementation. Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms while providing same accuracy and better sparsity in neural networks applications.
