Proximity to Losslessly Compressible Parameters
Matthew Farrugia-Roberts
TL;DR
The paper investigates lossless compressibility of neural network parameters in a simplified single-hidden-layer $\tanh$ setting, introducing the rank as the minimum hidden units needed for the same function and the proximate rank as the best nearby low-rank parameter under an $L^{\infty}$ neighborhood. It provides a formal, efficient algorithm (Compress) for optimal lossless compression and rank computation, and constructs a greedy method (Bound) to upper-bound the proximate rank in a uniform neighborhood, while proving that exactly bounding the proximate rank is $\mathcal{NP}$-hard via a reduction from the UPC problem and related $\mathcal{NP}$-complete problems. The results connect lossless compressibility to well-studied computational problems and lay groundwork for future theoretical and empirical exploration of near-lossless parameters in more complex architectures. This work offers a principled lens to study approximate compressibility in deep learning and motivates tractable approximations and empirical investigations into proximity to lossless-optimal representations.
Abstract
To better understand complexity in neural networks, we theoretically investigate the idealised phenomenon of lossless network compressibility, whereby an identical function can be implemented with fewer hidden units. In the setting of single-hidden-layer hyperbolic tangent networks, we define the rank of a parameter as the minimum number of hidden units required to implement the same function. We give efficient formal algorithms for optimal lossless compression and computing the rank of a parameter. Losslessly compressible parameters are atypical, but their existence has implications for nearby parameters. We define the proximate rank of a parameter as the rank of the most compressible parameter within a small L-infinity neighbourhood. We give an efficient greedy algorithm for bounding the proximate rank of a parameter, and show that the problem of tightly bounding the proximate rank is NP-complete. These results lay a foundation for future theoretical and empirical work on losslessly compressible parameters and their neighbours.
