Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

Hikari Otsuka; Daiki Chijiwa; Ángel López García-Arias; Yasuyuki Okoshi; Kazushi Kawamura; Thiem Van Chu; Daichi Fujiki; Susumu Takeuchi; Masato Motomura

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

Hikari Otsuka, Daiki Chijiwa, Ángel López García-Arias, Yasuyuki Okoshi, Kazushi Kawamura, Thiem Van Chu, Daichi Fujiki, Susumu Takeuchi, Masato Motomura

TL;DR

The paper tackles the problem of memory-efficient strong lottery tickets (SLTs) by introducing partial freezing of a randomly initialized network, combining random pruning (pruning) with weights that are permanently kept (locking) as part of the SLT. This freezing enables searching for SLTs across a broader sparsity range while regenerating both the random weights and the freezing pattern from seeds, yielding substantial model-size reductions with competitive or improved accuracy compared to non-frozen baselines; Edge-Popup is extended to operate effectively in frozen networks. The authors provide theoretical extensions of the subset-sum approximation to frozen networks, arguing that SLTs exist under freezing with sufficiently large width, and validate the approach experimentally on CIFAR-10, ImageNet, and OGBN-Arxiv across Conv6, ResNet-18, and GIN architectures, demonstrating favorable accuracy-to-model-size trades and substantial memory savings. This work has practical significance for energy-efficient inference on specialized hardware by reducing off-chip memory access and enabling leaner SLT-based models, with implications for future hardware accelerators and potential training-cost benefits.

Abstract

Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning--strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method for reducing the SLT memory size without restricting the sparsity of the SLTs that can be found. A random subset of the initial weights is frozen by either permanently pruning them or locking them as a fixed part of the SLT, resulting in a smaller model size. Experimental results show that Edge-Popup (Ramanujan et al., 2020; Sreenivasan et al., 2022) finds SLTs with better accuracy-to-model size trade-off within frozen networks than within dense or randomly pruned source networks. In particular, freezing $70\%$ of a ResNet on ImageNet provides $3.3 \times$ compression compared to the SLT found within a dense counterpart, raises accuracy by up to $14.12$ points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both.

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

TL;DR

Abstract

of a ResNet on ImageNet provides

compression compared to the SLT found within a dense counterpart, raises accuracy by up to

points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both.

Paper Structure (25 sections, 9 theorems, 10 equations, 7 figures, 2 tables)

This paper contains 25 sections, 9 theorems, 10 equations, 7 figures, 2 tables.

Introduction
Preliminaries
Strong Lottery Tickets in Dense Networks
SLT Existence via Subset-Sum Approximation
Strong Lottery Tickets in Sparse Networks
SLT Existence in Sparse Networks
Strong Lottery Tickets in Frozen Networks
Partial Freezing for Enhanced SLT Compression
Frozen network construction:
Setting the layer-wise ratios:
Freezing pattern encoding for model compression:
Optimal Pruning:Locking Proportion for Freezing
SLT Existence in Frozen Networks
Experiments
Experimental Settings
...and 10 more sections

Key Result

Lemma 2.1

Let $X_1, ..., X_n \sim U(-1, 1)$ be independent, uniformly distributed random variables. Then, except with exponentially small probability, any $z\in [-1, 1]$ can be approximated by a subset-sum of $X_i$ if $n$ is sufficiently large.

Figures (7)

Figure 1: Freezing the source network by randomly pruning some parameters and locking others reduces the memorized supermask for finding an SLT.
Figure 2: Freezing ($$∙) improves the accuracy-to-model size trade-off over pre-pruning only ($$∙) or non-freezing ($$∙).
Figure 3: Pre-pruning and locking set the bounds of the SLT sparsity that can be found. These optimal bounds are investigated in \ref{['subsec:investigating_the_ratio_of_pruning_and_fixing_ratio']}.
Figure 4: Different prune:lock proportions of a $80\%$ freezing ratio using a Conv6.
Figure 5: Impact of the freezing ratio on different architectures. Pruning and locking ratios are set following \ref{['subsec:investigating_the_ratio_of_pruning_and_fixing_ratio']}.
...and 2 more figures

Theorems & Definitions (11)

Lemma 2.1: Subset-Sum Approximation subset_sum_lueker
Lemma 2.2: Subset-Sum Approximation in Sparse Networks slt_proof_random_pruning
Theorem 2.1: SLT Existence in Sparse Networks slt_proof_random_pruning
Lemma 3.1: Subset-Sum Approximation in Randomly Locked Networks
Lemma 3.2: Subset-Sum Approximation in Frozen Networks
Theorem 3.1: SLT Existence in Frozen Networks
Lemma A.1: Subset-Sum Approximation in Randomly Locked Networks
proof
Lemma A.2: Subset-Sum Approximation in Frozen Networks.
Theorem A.1: SLT Existence in Frozen Networks
...and 1 more

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

TL;DR

Abstract

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (11)