Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

Itamar Tsayag; Ofir Lindenbaum

Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

Itamar Tsayag, Ofir Lindenbaum

TL;DR

Experiments across fully connected networks, CNNs, and Vision Transformers demonstrate up to 90% sparsity with minimal accuracy loss - nearly double the sparsity achieved by edge-popup at comparable accuracy - establishing a scalable framework for pre-training network sparsification.

Abstract

Over-parameterized neural networks incur prohibitive memory and computational costs for resource-constrained deployment. The Strong Lottery Ticket (SLT) hypothesis suggests that randomly initialized networks contain sparse subnetworks achieving competitive accuracy without weight training. Existing SLT methods, notably edge-popup, rely on non-differentiable score-based selection, limiting optimization efficiency and scalability. We propose using continuously relaxed Bernoulli gates to discover SLTs through fully differentiable, end-to-end optimization - training only gating parameters while keeping all network weights frozen at their initialized values. Continuous relaxation enables direct gradient-based optimization of an $\ell_0$-regularization objective, eliminating the need for non-differentiable gradient estimators or iterative pruning cycles. To our knowledge, this is the first fully differentiable approach for SLT discovery that avoids straight-through estimator approximations. Experiments across fully connected networks, CNNs (ResNet, Wide-ResNet), and Vision Transformers (ViT, Swin-T) demonstrate up to 90% sparsity with minimal accuracy loss - nearly double the sparsity achieved by edge-popup at comparable accuracy - establishing a scalable framework for pre-training network sparsification.

Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

TL;DR

Abstract

-regularization objective, eliminating the need for non-differentiable gradient estimators or iterative pruning cycles. To our knowledge, this is the first fully differentiable approach for SLT discovery that avoids straight-through estimator approximations. Experiments across fully connected networks, CNNs (ResNet, Wide-ResNet), and Vision Transformers (ViT, Swin-T) demonstrate up to 90% sparsity with minimal accuracy loss - nearly double the sparsity achieved by edge-popup at comparable accuracy - establishing a scalable framework for pre-training network sparsification.

Paper Structure (24 sections, 5 equations, 3 figures, 3 tables)

This paper contains 24 sections, 5 equations, 3 figures, 3 tables.

Introduction
Related Work
Lottery Ticket Hypothesis
Strong Lottery Tickets
Neural Network Pruning
Continuous Relaxations for Discrete Selection
Proposed Solution
Inference.
Experiments
Experimental Setup
Hyperparameters.
Training Details.
Evaluation Protocol.
Sparsification of Fully Connected Networks
Sparsification of CNNs
...and 9 more sections

Figures (3)

Figure 1: Pre-training sparsification on LeNet-300-100. Blue region: percentage of pruned weights. Green region: retained weights. Red line: test accuracy on MNIST as sparsification progresses. The method achieves 96% accuracy at 45% sparsification.
Figure 2: Per-layer sparsification of ResNet50 on CIFAR-10. Later layers exhibit higher sparsification rates, consistent with prior findings that early layers require more weights for low-level feature extraction.
Figure 3: Robustness to base network size (LeNet on MNIST). Blue: pruned weights. Green: retained weights. Red: test accuracy. SLTs can be discovered even in base networks at 20% of the original size.

Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

TL;DR

Abstract

Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

Authors

TL;DR

Abstract

Table of Contents

Figures (3)