Table of Contents
Fetching ...

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Arthur da Cunha, Francesco d'Amore, Emanuele Natale

TL;DR

This work tackles the Structured Strong Lottery Ticket Hypothesis (SLTH) for Convolutional Neural Networks by developing a multidimensional Random Subset Sum framework tailored to normally-scaled NSN vectors that arise from CNN parameter sharing. It proves a sub-exponential bound showing that polynomially over-parameterized random CNNs contain structured subnetworks, obtainable via block and filter pruning, that approximate any sufficiently smaller target CNN. The key methodological advance is a Multidimensional Random Subset Sum result for NSN vectors, which underpins a kernel-pruning approach that extends from single-layer to multi-layer CNNs while controlling error propagation. The results illuminate how over-parameterization and structured sparsity jointly enable efficient pruning, with implications for understanding how over-parameterization facilitates both performance and computational efficiency in deep learning.

Abstract

The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised neural networks likely contain subnetworks that perform well without any training. Although unstructured pruning has been extensively studied in this context, its structured counterpart, which can deliver significant computational and memory efficiency gains, has been largely unexplored. One of the main reasons for this gap is the limitations of the underlying mathematical tools used in formal analyses of the SLTH. In this paper, we overcome these limitations: we leverage recent advances in the multidimensional generalisation of the Random Subset-Sum Problem and obtain a variant that admits the stochastic dependencies that arise when addressing structured pruning in the SLTH. We apply this result to prove, for a wide class of random Convolutional Neural Networks, the existence of structured subnetworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the SLTH for structured pruning, opening up new avenues for further research on the hypothesis and contributing to the understanding of the role of over-parameterization in deep learning.

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

TL;DR

This work tackles the Structured Strong Lottery Ticket Hypothesis (SLTH) for Convolutional Neural Networks by developing a multidimensional Random Subset Sum framework tailored to normally-scaled NSN vectors that arise from CNN parameter sharing. It proves a sub-exponential bound showing that polynomially over-parameterized random CNNs contain structured subnetworks, obtainable via block and filter pruning, that approximate any sufficiently smaller target CNN. The key methodological advance is a Multidimensional Random Subset Sum result for NSN vectors, which underpins a kernel-pruning approach that extends from single-layer to multi-layer CNNs while controlling error propagation. The results illuminate how over-parameterization and structured sparsity jointly enable efficient pruning, with implications for understanding how over-parameterization facilitates both performance and computational efficiency in deep learning.

Abstract

The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised neural networks likely contain subnetworks that perform well without any training. Although unstructured pruning has been extensively studied in this context, its structured counterpart, which can deliver significant computational and memory efficiency gains, has been largely unexplored. One of the main reasons for this gap is the limitations of the underlying mathematical tools used in formal analyses of the SLTH. In this paper, we overcome these limitations: we leverage recent advances in the multidimensional generalisation of the Random Subset-Sum Problem and obtain a variant that admits the stochastic dependencies that arise when addressing structured pruning in the SLTH. We apply this result to prove, for a wide class of random Convolutional Neural Networks, the existence of structured subnetworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the SLTH for structured pruning, opening up new avenues for further research on the hypothesis and contributing to the understanding of the role of over-parameterization in deep learning.
Paper Structure (14 sections, 15 theorems, 99 equations, 2 figures)

This paper contains 14 sections, 15 theorems, 99 equations, 2 figures.

Key Result

Theorem 1

Let $X_1, \dots, X_n$ be independent uniform random variables over $[-1, 1]$, and let $\varepsilon \in (0, 1/3)$. There exists a universal constant $C > 0$ such that, if $n \ge C\log(1/\varepsilon)$, then, with probability at least $1 - \varepsilon$, for all $z \in [-1, 1]$ there exists $S_z \subset

Figures (2)

  • Figure 1: Illustration of neuron pruning. The left side shows the effect of pruning of neurons in the weight-matrix of a fully-connected layer. The rows in white correspond to neurons pruned in the associated layer while the columns in white represent the effect of removing neurons from the previous layers. On the right, we allude to the possibility of collapsing the pruned matrix into a smaller, dense one.
  • Figure 2: Examples of different pruning patterns.

Theorems & Definitions (35)

  • Theorem 1: lueker1998da_cunha_revisiting_2022
  • Theorem 2: Structured SLTH
  • Definition 3: $n$-channel-blocked mask
  • Definition 4: NSN vector
  • Theorem 5: Normally-scaled MRSS
  • Definition 6: Subset-sum number
  • Lemma 7: Sum of NSN vectors
  • proof : Overview of the proof.
  • Lemma 8: Sum of NSN vectors
  • proof : Overview of the proof.
  • ...and 25 more