Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

Pedram Bakhtiarifard; Tong Chen; Jonathan Wenshøj; Erik B Dam; Raghavendra Selvan

Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

Pedram Bakhtiarifard, Tong Chen, Jonathan Wenshøj, Erik B Dam, Raghavendra Selvan

TL;DR

Empirical evidence shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training, resulting in models that perform comparably with unconstrained models while being algorithmically simpler.

Abstract

Large-scale deep learning models are well-suited for compression. Methods like pruning, quantization, and knowledge distillation have been used to achieve massive reductions in the number of model parameters, with marginal performance drops across a variety of architectures and tasks. This raises the central question: \emph{Why are deep neural networks suited for compression?} In this work, we take up the perspective of algorithmic complexity to explain this behavior. We hypothesize that the parameters of trained models have more structure and, hence, exhibit lower algorithmic complexity compared to the weights at (random) initialization. Furthermore, that model compression methods harness this reduced algorithmic complexity to compress models. Although an unconstrained parameterization of model weights, $\mathbf{w} \in \mathbb{R}^n$, can represent arbitrary weight assignments, the solutions found during training exhibit repeatability and structure, making them algorithmically simpler than a generic program. To this end, we formalize the Kolmogorov complexity of $\mathbf{w}$ by $\mathcal{K}(\mathbf{w})$. We introduce a constrained parameterization $\widehat{\mathbf{w}}$, that partitions parameters into blocks of size $s$, and restricts each block to be selected from a set of $k$ reusable motifs, specified by a reuse pattern (or mosaic). The resulting method, $\textit{Mosaic-of-Motifs}$ (MoMos), yields algorithmically simpler model parameterization compared to unconstrained models. Empirical evidence from multiple experiments shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training. This results in models that perform comparably with unconstrained models while being algorithmically simpler.

Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

TL;DR

Abstract

, can represent arbitrary weight assignments, the solutions found during training exhibit repeatability and structure, making them algorithmically simpler than a generic program. To this end, we formalize the Kolmogorov complexity of

. We introduce a constrained parameterization

, that partitions parameters into blocks of size

, and restricts each block to be selected from a set of

reusable motifs, specified by a reuse pattern (or mosaic). The resulting method,

(MoMos), yields algorithmically simpler model parameterization compared to unconstrained models. Empirical evidence from multiple experiments shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training. This results in models that perform comparably with unconstrained models while being algorithmically simpler.

Paper Structure (35 sections, 7 theorems, 29 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 35 sections, 7 theorems, 29 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Related work
Algorithmic Complexity of Binary Objects.
Mosaic-of-Motifs (MoMos)
Geometry of the Constrained Optimization Domain
Distinct Linear Components
Richness of the Optimization Domain.
MoMos Algorithm
Experiments
Experimental Setup.
Relative Motif Budget.
Relative Algorithmic Compression (RAC).
BDM Complexity Ratio.
Results
Training Reduces Algorithmic Complexity.
...and 20 more sections

Key Result

Proposition 3.3

There exists a constant $\zeta$, such that

Figures (6)

Figure 1: (left) Unconstrained $16 \times 16$ parameterization with randomly selected motifs highlighted. (right) MoMos parameterization enforcing $2\times2$ motif reuse.
Figure 2: Reduction in algorithmic complexity measured as the ratio of the BDM length of a pretrained model and its corresponding random initialization. The plot shows the BDM complexity ratio (see Section \ref{['sec:experiments']}) for 110 pretrained models from Pytorch Image Models (timm) library with up to 100M trainable parameters.
Figure 3: Distortion between the dense weights and the MoMos reconstruction with $s=4$ during training of MLP (top row) and Mobile-ViT (bottom row). First and second column is $c=0.05, c=0.1$, respectively.
Figure 4: Tiny-ViT MoMos: Validation accuracy over capacity for varying block sizes averaged over three seeds. The dashed line denotes the FP32 baseline.
Figure 5: Distortion between the dense weights and the MoMos reconstruction with $s=4$ during training of Tiny-ViT (top row) and ResNet20 (bottom row). Columns are arranged as per capacity from left to right: $c=[0.005,0.01,0.05,0.1]$.
...and 1 more figures

Theorems & Definitions (12)

Definition 3.1: Hypothesis Class
Remark 3.2: Granularity and repeatability
Proposition 3.3: MoMos Complexity Bound
Corollary 3.4: Lower Complexity Regime
Lemma 3.5: Linear Component
Proposition 3.6: Union of Linear Subspaces
proof
Proposition 3.7: Distinctness Criterion
Definition 3.8: Number of Distinct Components
Corollary 3.9: Exponential Growth
...and 2 more

Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

TL;DR

Abstract

Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)