Proximity to Losslessly Compressible Parameters

Matthew Farrugia-Roberts

Proximity to Losslessly Compressible Parameters

Matthew Farrugia-Roberts

TL;DR

The paper investigates lossless compressibility of neural network parameters in a simplified single-hidden-layer $\tanh$ setting, introducing the rank as the minimum hidden units needed for the same function and the proximate rank as the best nearby low-rank parameter under an $L^{\infty}$ neighborhood. It provides a formal, efficient algorithm (Compress) for optimal lossless compression and rank computation, and constructs a greedy method (Bound) to upper-bound the proximate rank in a uniform neighborhood, while proving that exactly bounding the proximate rank is $\mathcal{NP}$-hard via a reduction from the UPC problem and related $\mathcal{NP}$-complete problems. The results connect lossless compressibility to well-studied computational problems and lay groundwork for future theoretical and empirical exploration of near-lossless parameters in more complex architectures. This work offers a principled lens to study approximate compressibility in deep learning and motivates tractable approximations and empirical investigations into proximity to lossless-optimal representations.

Abstract

To better understand complexity in neural networks, we theoretically investigate the idealised phenomenon of lossless network compressibility, whereby an identical function can be implemented with fewer hidden units. In the setting of single-hidden-layer hyperbolic tangent networks, we define the rank of a parameter as the minimum number of hidden units required to implement the same function. We give efficient formal algorithms for optimal lossless compression and computing the rank of a parameter. Losslessly compressible parameters are atypical, but their existence has implications for nearby parameters. We define the proximate rank of a parameter as the rank of the most compressible parameter within a small L-infinity neighbourhood. We give an efficient greedy algorithm for bounding the proximate rank of a parameter, and show that the problem of tightly bounding the proximate rank is NP-complete. These results lay a foundation for future theoretical and empirical work on losslessly compressible parameters and their neighbours.

Proximity to Losslessly Compressible Parameters

TL;DR

The paper investigates lossless compressibility of neural network parameters in a simplified single-hidden-layer

setting, introducing the rank as the minimum hidden units needed for the same function and the proximate rank as the best nearby low-rank parameter under an

neighborhood. It provides a formal, efficient algorithm (Compress) for optimal lossless compression and rank computation, and constructs a greedy method (Bound) to upper-bound the proximate rank in a uniform neighborhood, while proving that exactly bounding the proximate rank is

-hard via a reduction from the UPC problem and related

-complete problems. The results connect lossless compressibility to well-studied computational problems and lay groundwork for future theoretical and empirical exploration of near-lossless parameters in more complex architectures. This work offers a principled lens to study approximate compressibility in deep learning and motivates tractable approximations and empirical investigations into proximity to lossless-optimal representations.

Abstract

Paper Structure (53 sections, 23 theorems, 14 equations, 10 figures, 4 tables, 5 algorithms)

This paper contains 53 sections, 23 theorems, 14 equations, 10 figures, 4 tables, 5 algorithms.

Introduction
Related work
Approximate compression.
Lossless compression.
Functional equivalence.
Lossless expansion.
Information singularities.
Preliminaries
Architecture.
Reducibility.
Uniform neighbourhood.
Computational complexity theory.
Optimal lossless compression and rank
Proximity to low-rank parameters
Computational complexity of proximate rank
...and 38 more sections

Key Result

theorem 1

Given $w \in \mathcal{W}^{}_h$, compute $w' = \textsc{Compress}(w) \in \mathcal{W}^{}_r$. (i) $f_{w'} = f_w$, and (ii) $w'$ is incompressible.

Figures (10)

Figure 1: Example of reduction from Boolean satisfiability to Problem \ref{['prob:upc']}. (a) Grid layout of an incidence graph for a satisfiable Boolean formula (circles: variables, squares: clauses, edges: $\pm$ literals). (b) A corresponding set of $h=68$ source points. (c) A $(34,\varepsilon)$-cover of the source points.
Figure 2: Illustrative example of the parameter construction. (a) A set of source points ${x_{1},\ldots,x_{7}}$. (b) Transformation $T$ translates all points into the positive quadrant by a margin of $2\varepsilon$. (c,d) The coordinates of the transformed points become the incoming weights and biases of the parameter.
Figure 3: Example instances for Problem \ref{['prob:upc']}, Problem \ref{['prob:upp']}, and Problem \ref{['prob:usgcp']}. (a) Nine (source) points ${x_{1},\ldots,x_{9}}$. (b) A $(4, \varepsilon)$-cover ${y_{1},\ldots,y_{4}}$. (c) A $(4, 2\varepsilon)$-partition ${\Pi_{1},\ldots,\Pi_{4}}$ ($= \left\{1,3\right\}, \left\{2,4\right\}, \left\{5,8\right\}, \left\{6,7,9\right\}$). (d) The nine points ${x_{1},\ldots,x_{9}}$, along with $2\varepsilon$-width squares. (e) The corresponding unit square graph with vertices ${v_{1},\ldots,v_{9}}$. (f) A partition of the unit square graph into four cliques ${\Pi_{1},\ldots,\Pi_{4}}$. Note: $\varepsilon$ represents a radius in Problem \ref{['prob:upc']}, but a diameter in Problems \ref{['prob:upp']} and \ref{['prob:usgcp']}. In these examples, we use $\varepsilon$ for the radius and $2\varepsilon$ for the diameter.
Figure 4: Three example restricted Boolean formulas in conjunctive normal form are as follows: $\phi_1 = (v_1 \lor v_2) \land (\bar{v}_1 \lor v_2) \land (v_3 \lor v_4) \land (\bar{v}_3 \lor v_4) \land (\bar{v}_2 \lor \bar{v}_4)$; $\phi_2 = (\bar{v}_1 \lor v_2) \land (v_1 \lor v_2 \lor \bar{v}_3) \land (\bar{v}_2 \lor v_3)$; and $\phi_3 = (\bar{v}_1 \lor \bar{v}_3 \lor v_4) \land (v_1 \lor \bar{v}_2 \lor v_5) \land (v_3 \lor \bar{v}_4 \lor v_6) \land (v_2 \lor \bar{v}_5) \land (v_5 \lor v_6) \land (v_4 \lor \bar{v}_6)$. The bipartite variable--clause incidence graphs of the three formulas are depicted above (circles indicate variable vertices, squares indicate clause vertices, positive and negative occurrences (edges) are marked accordingly. Formula $\phi_1$ is unsatisfiable whereas formulas $\phi_2$ and $\phi_3$ are each satisfiable.
Figure 5: Example of reduction from restricted Boolean satisfiability to Problem \ref{['prob:upp']}. (a) A satisfiable restricted Boolean formula. (b) The formula's planar bipartite variable--clause incidence graph (circles: variables, squares: clauses, edges: $\pm$ literals). (c) The graph embedded onto an integer grid. (d) The embedding divided into unit tiles of various types. (e) The $h = 68$ source points aggregated from each of the tiles. (f) Existence of a $(34,1/4)$-partition of the source points.
...and 5 more figures

Theorems & Definitions (28)

theorem 1: \ref{['algo:lossless-compression']} correctness
theorem 2: \ref{['algo:rank']} correctness
remark 3
theorem 4: \ref{['algo:greedy-prank-bound']} correctness
remark 5
theorem 6
theorem 7
theorem 7: \ref{['algo:lossless-compression']} correctness
theorem 7: \ref{['algo:greedy-prank-bound']} correctness
theorem 8
...and 18 more

Proximity to Losslessly Compressible Parameters

TL;DR

Abstract

Proximity to Losslessly Compressible Parameters

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (28)