Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Arthur da Cunha; Francesco d'Amore; Emanuele Natale

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Arthur da Cunha, Francesco d'Amore, Emanuele Natale

TL;DR

This work tackles the Structured Strong Lottery Ticket Hypothesis (SLTH) for Convolutional Neural Networks by developing a multidimensional Random Subset Sum framework tailored to normally-scaled NSN vectors that arise from CNN parameter sharing. It proves a sub-exponential bound showing that polynomially over-parameterized random CNNs contain structured subnetworks, obtainable via block and filter pruning, that approximate any sufficiently smaller target CNN. The key methodological advance is a Multidimensional Random Subset Sum result for NSN vectors, which underpins a kernel-pruning approach that extends from single-layer to multi-layer CNNs while controlling error propagation. The results illuminate how over-parameterization and structured sparsity jointly enable efficient pruning, with implications for understanding how over-parameterization facilitates both performance and computational efficiency in deep learning.

Abstract

The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised neural networks likely contain subnetworks that perform well without any training. Although unstructured pruning has been extensively studied in this context, its structured counterpart, which can deliver significant computational and memory efficiency gains, has been largely unexplored. One of the main reasons for this gap is the limitations of the underlying mathematical tools used in formal analyses of the SLTH. In this paper, we overcome these limitations: we leverage recent advances in the multidimensional generalisation of the Random Subset-Sum Problem and obtain a variant that admits the stochastic dependencies that arise when addressing structured pruning in the SLTH. We apply this result to prove, for a wide class of random Convolutional Neural Networks, the existence of structured subnetworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the SLTH for structured pruning, opening up new avenues for further research on the hypothesis and contributing to the understanding of the role of over-parameterization in deep learning.

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

TL;DR

Abstract

Paper Structure (14 sections, 15 theorems, 99 equations, 2 figures)

This paper contains 14 sections, 15 theorems, 99 equations, 2 figures.

Introduction
Related Work
Preliminaries and contribution
Analysis
Multidimensional Random Subset Sum for normally-scaled normal vectors
Notation.
Proving SLTH for structured pruning
Limitations and future work
Technical tools
Concentration inequalities
Supporting results
Omitted proofs
Multidimensional Random Subset Sum for normally-scaled normal vectors
Kernel Pruning

Key Result

Theorem 1

Let $X_1, \dots, X_n$ be independent uniform random variables over $[-1, 1]$, and let $\varepsilon \in (0, 1/3)$. There exists a universal constant $C > 0$ such that, if $n \ge C\log(1/\varepsilon)$, then, with probability at least $1 - \varepsilon$, for all $z \in [-1, 1]$ there exists $S_z \subset

Figures (2)

Figure 1: Illustration of neuron pruning. The left side shows the effect of pruning of neurons in the weight-matrix of a fully-connected layer. The rows in white correspond to neurons pruned in the associated layer while the columns in white represent the effect of removing neurons from the previous layers. On the right, we allude to the possibility of collapsing the pruned matrix into a smaller, dense one.
Figure 2: Examples of different pruning patterns.

Theorems & Definitions (35)

Theorem 1: lueker1998da_cunha_revisiting_2022
Theorem 2: Structured SLTH
Definition 3: $n$-channel-blocked mask
Definition 4: NSN vector
Theorem 5: Normally-scaled MRSS
Definition 6: Subset-sum number
Lemma 7: Sum of NSN vectors
proof : Overview of the proof.
Lemma 8: Sum of NSN vectors
proof : Overview of the proof.
...and 25 more

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

TL;DR

Abstract

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (35)