Table of Contents
Fetching ...

Proximity to Losslessly Compressible Parameters

Matthew Farrugia-Roberts

TL;DR

The paper investigates lossless compressibility of neural network parameters in a simplified single-hidden-layer $\tanh$ setting, introducing the rank as the minimum hidden units needed for the same function and the proximate rank as the best nearby low-rank parameter under an $L^{\infty}$ neighborhood. It provides a formal, efficient algorithm (Compress) for optimal lossless compression and rank computation, and constructs a greedy method (Bound) to upper-bound the proximate rank in a uniform neighborhood, while proving that exactly bounding the proximate rank is $\mathcal{NP}$-hard via a reduction from the UPC problem and related $\mathcal{NP}$-complete problems. The results connect lossless compressibility to well-studied computational problems and lay groundwork for future theoretical and empirical exploration of near-lossless parameters in more complex architectures. This work offers a principled lens to study approximate compressibility in deep learning and motivates tractable approximations and empirical investigations into proximity to lossless-optimal representations.

Abstract

To better understand complexity in neural networks, we theoretically investigate the idealised phenomenon of lossless network compressibility, whereby an identical function can be implemented with fewer hidden units. In the setting of single-hidden-layer hyperbolic tangent networks, we define the rank of a parameter as the minimum number of hidden units required to implement the same function. We give efficient formal algorithms for optimal lossless compression and computing the rank of a parameter. Losslessly compressible parameters are atypical, but their existence has implications for nearby parameters. We define the proximate rank of a parameter as the rank of the most compressible parameter within a small L-infinity neighbourhood. We give an efficient greedy algorithm for bounding the proximate rank of a parameter, and show that the problem of tightly bounding the proximate rank is NP-complete. These results lay a foundation for future theoretical and empirical work on losslessly compressible parameters and their neighbours.

Proximity to Losslessly Compressible Parameters

TL;DR

The paper investigates lossless compressibility of neural network parameters in a simplified single-hidden-layer setting, introducing the rank as the minimum hidden units needed for the same function and the proximate rank as the best nearby low-rank parameter under an neighborhood. It provides a formal, efficient algorithm (Compress) for optimal lossless compression and rank computation, and constructs a greedy method (Bound) to upper-bound the proximate rank in a uniform neighborhood, while proving that exactly bounding the proximate rank is -hard via a reduction from the UPC problem and related -complete problems. The results connect lossless compressibility to well-studied computational problems and lay groundwork for future theoretical and empirical exploration of near-lossless parameters in more complex architectures. This work offers a principled lens to study approximate compressibility in deep learning and motivates tractable approximations and empirical investigations into proximity to lossless-optimal representations.

Abstract

To better understand complexity in neural networks, we theoretically investigate the idealised phenomenon of lossless network compressibility, whereby an identical function can be implemented with fewer hidden units. In the setting of single-hidden-layer hyperbolic tangent networks, we define the rank of a parameter as the minimum number of hidden units required to implement the same function. We give efficient formal algorithms for optimal lossless compression and computing the rank of a parameter. Losslessly compressible parameters are atypical, but their existence has implications for nearby parameters. We define the proximate rank of a parameter as the rank of the most compressible parameter within a small L-infinity neighbourhood. We give an efficient greedy algorithm for bounding the proximate rank of a parameter, and show that the problem of tightly bounding the proximate rank is NP-complete. These results lay a foundation for future theoretical and empirical work on losslessly compressible parameters and their neighbours.
Paper Structure (53 sections, 23 theorems, 14 equations, 10 figures, 4 tables, 5 algorithms)

This paper contains 53 sections, 23 theorems, 14 equations, 10 figures, 4 tables, 5 algorithms.

Key Result

theorem 1

Given $w \in \mathcal{W}^{}_h$, compute $w' = \textsc{Compress}(w) \in \mathcal{W}^{}_r$. (i) $f_{w'} = f_w$, and (ii) $w'$ is incompressible.

Figures (10)

  • Figure 1: Example of reduction from Boolean satisfiability to Problem \ref{['prob:upc']}. (a) Grid layout of an incidence graph for a satisfiable Boolean formula (circles: variables, squares: clauses, edges: $\pm$ literals). (b) A corresponding set of $h=68$ source points. (c) A $(34,\varepsilon)$-cover of the source points.
  • Figure 2: Illustrative example of the parameter construction. (a) A set of source points ${x_{1},\ldots,x_{7}}$. (b) Transformation $T$ translates all points into the positive quadrant by a margin of $2\varepsilon$. (c,d) The coordinates of the transformed points become the incoming weights and biases of the parameter.
  • Figure 3: Example instances for Problem \ref{['prob:upc']}, Problem \ref{['prob:upp']}, and Problem \ref{['prob:usgcp']}. (a) Nine (source) points ${x_{1},\ldots,x_{9}}$. (b) A $(4, \varepsilon)$-cover ${y_{1},\ldots,y_{4}}$. (c) A $(4, 2\varepsilon)$-partition ${\Pi_{1},\ldots,\Pi_{4}}$ ($= \left\{1,3\right\}, \left\{2,4\right\}, \left\{5,8\right\}, \left\{6,7,9\right\}$). (d) The nine points ${x_{1},\ldots,x_{9}}$, along with $2\varepsilon$-width squares. (e) The corresponding unit square graph with vertices ${v_{1},\ldots,v_{9}}$. (f) A partition of the unit square graph into four cliques ${\Pi_{1},\ldots,\Pi_{4}}$. Note: $\varepsilon$ represents a radius in Problem \ref{['prob:upc']}, but a diameter in Problems \ref{['prob:upp']} and \ref{['prob:usgcp']}. In these examples, we use $\varepsilon$ for the radius and $2\varepsilon$ for the diameter.
  • Figure 4: Three example restricted Boolean formulas in conjunctive normal form are as follows: $\phi_1 = (v_1 \lor v_2) \land (\bar{v}_1 \lor v_2) \land (v_3 \lor v_4) \land (\bar{v}_3 \lor v_4) \land (\bar{v}_2 \lor \bar{v}_4)$; $\phi_2 = (\bar{v}_1 \lor v_2) \land (v_1 \lor v_2 \lor \bar{v}_3) \land (\bar{v}_2 \lor v_3)$; and $\phi_3 = (\bar{v}_1 \lor \bar{v}_3 \lor v_4) \land (v_1 \lor \bar{v}_2 \lor v_5) \land (v_3 \lor \bar{v}_4 \lor v_6) \land (v_2 \lor \bar{v}_5) \land (v_5 \lor v_6) \land (v_4 \lor \bar{v}_6)$. The bipartite variable--clause incidence graphs of the three formulas are depicted above (circles indicate variable vertices, squares indicate clause vertices, positive and negative occurrences (edges) are marked accordingly. Formula $\phi_1$ is unsatisfiable whereas formulas $\phi_2$ and $\phi_3$ are each satisfiable.
  • Figure 5: Example of reduction from restricted Boolean satisfiability to Problem \ref{['prob:upp']}. (a) A satisfiable restricted Boolean formula. (b) The formula's planar bipartite variable--clause incidence graph (circles: variables, squares: clauses, edges: $\pm$ literals). (c) The graph embedded onto an integer grid. (d) The embedding divided into unit tiles of various types. (e) The $h = 68$ source points aggregated from each of the tiles. (f) Existence of a $(34,1/4)$-partition of the source points.
  • ...and 5 more figures

Theorems & Definitions (28)

  • theorem 1: \ref{['algo:lossless-compression']} correctness
  • theorem 2: \ref{['algo:rank']} correctness
  • remark 3
  • theorem 4: \ref{['algo:greedy-prank-bound']} correctness
  • remark 5
  • theorem 6
  • theorem 7
  • theorem 7: \ref{['algo:lossless-compression']} correctness
  • theorem 7: \ref{['algo:greedy-prank-bound']} correctness
  • theorem 8
  • ...and 18 more