Table of Contents
Fetching ...

Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

Pedram Bakhtiarifard, Tong Chen, Jonathan Wenshøj, Erik B Dam, Raghavendra Selvan

TL;DR

Empirical evidence shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training, resulting in models that perform comparably with unconstrained models while being algorithmically simpler.

Abstract

Large-scale deep learning models are well-suited for compression. Methods like pruning, quantization, and knowledge distillation have been used to achieve massive reductions in the number of model parameters, with marginal performance drops across a variety of architectures and tasks. This raises the central question: \emph{Why are deep neural networks suited for compression?} In this work, we take up the perspective of algorithmic complexity to explain this behavior. We hypothesize that the parameters of trained models have more structure and, hence, exhibit lower algorithmic complexity compared to the weights at (random) initialization. Furthermore, that model compression methods harness this reduced algorithmic complexity to compress models. Although an unconstrained parameterization of model weights, $\mathbf{w} \in \mathbb{R}^n$, can represent arbitrary weight assignments, the solutions found during training exhibit repeatability and structure, making them algorithmically simpler than a generic program. To this end, we formalize the Kolmogorov complexity of $\mathbf{w}$ by $\mathcal{K}(\mathbf{w})$. We introduce a constrained parameterization $\widehat{\mathbf{w}}$, that partitions parameters into blocks of size $s$, and restricts each block to be selected from a set of $k$ reusable motifs, specified by a reuse pattern (or mosaic). The resulting method, $\textit{Mosaic-of-Motifs}$ (MoMos), yields algorithmically simpler model parameterization compared to unconstrained models. Empirical evidence from multiple experiments shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training. This results in models that perform comparably with unconstrained models while being algorithmically simpler.

Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

TL;DR

Empirical evidence shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training, resulting in models that perform comparably with unconstrained models while being algorithmically simpler.

Abstract

Large-scale deep learning models are well-suited for compression. Methods like pruning, quantization, and knowledge distillation have been used to achieve massive reductions in the number of model parameters, with marginal performance drops across a variety of architectures and tasks. This raises the central question: \emph{Why are deep neural networks suited for compression?} In this work, we take up the perspective of algorithmic complexity to explain this behavior. We hypothesize that the parameters of trained models have more structure and, hence, exhibit lower algorithmic complexity compared to the weights at (random) initialization. Furthermore, that model compression methods harness this reduced algorithmic complexity to compress models. Although an unconstrained parameterization of model weights, , can represent arbitrary weight assignments, the solutions found during training exhibit repeatability and structure, making them algorithmically simpler than a generic program. To this end, we formalize the Kolmogorov complexity of by . We introduce a constrained parameterization , that partitions parameters into blocks of size , and restricts each block to be selected from a set of reusable motifs, specified by a reuse pattern (or mosaic). The resulting method, (MoMos), yields algorithmically simpler model parameterization compared to unconstrained models. Empirical evidence from multiple experiments shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training. This results in models that perform comparably with unconstrained models while being algorithmically simpler.
Paper Structure (35 sections, 7 theorems, 29 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 35 sections, 7 theorems, 29 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Proposition 3.3

There exists a constant $\zeta$, such that

Figures (6)

  • Figure 1: (left) Unconstrained $16 \times 16$ parameterization with randomly selected motifs highlighted. (right) MoMos parameterization enforcing $2\times2$ motif reuse.
  • Figure 2: Reduction in algorithmic complexity measured as the ratio of the BDM length of a pretrained model and its corresponding random initialization. The plot shows the BDM complexity ratio (see Section \ref{['sec:experiments']}) for 110 pretrained models from Pytorch Image Models (timm) library with up to 100M trainable parameters.
  • Figure 3: Distortion between the dense weights and the MoMos reconstruction with $s=4$ during training of MLP (top row) and Mobile-ViT (bottom row). First and second column is $c=0.05, c=0.1$, respectively.
  • Figure 4: Tiny-ViT MoMos: Validation accuracy over capacity for varying block sizes averaged over three seeds. The dashed line denotes the FP32 baseline.
  • Figure 5: Distortion between the dense weights and the MoMos reconstruction with $s=4$ during training of Tiny-ViT (top row) and ResNet20 (bottom row). Columns are arranged as per capacity from left to right: $c=[0.005,0.01,0.05,0.1]$.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Definition 3.1: Hypothesis Class
  • Remark 3.2: Granularity and repeatability
  • Proposition 3.3: MoMos Complexity Bound
  • Corollary 3.4: Lower Complexity Regime
  • Lemma 3.5: Linear Component
  • Proposition 3.6: Union of Linear Subspaces
  • proof
  • Proposition 3.7: Distinctness Criterion
  • Definition 3.8: Number of Distinct Components
  • Corollary 3.9: Exponential Growth
  • ...and 2 more