Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

Tausifa Jan Saleem; Ramanjit Ahuja; Surendra Prasad; Brejesh Lall

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

Tausifa Jan Saleem, Ramanjit Ahuja, Surendra Prasad, Brejesh Lall

TL;DR

This work empirically studies the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process to provide insights into phenomena like pruning of smaller magnitude weights and the role of the iterative process.

Abstract

Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

TL;DR

Abstract

Paper Structure (22 sections, 7 equations, 32 figures, 9 tables)

This paper contains 22 sections, 7 equations, 32 figures, 9 tables.

Introduction
Background Information and Problem Formulation
Background Information
Definitions and Notations.
Problem Statement and Questions of Interest.
Methodology
Results and Findings
Result 1: Special solutions with small volume exist.
Result 2: Why does the initialization proposed by the lottery ticket hypothesis work well?
Result 3: Why do we need the Iterative Process, and why does one-shot pruning not work as well?
Result 4: There exists a barrier between IMP solutions at successive levels in the loss landscape.
Result 5: IMP solutions obtained using rewinding lie within the same loss sublevel set.
Result 6: What happens when you prune the smaller weights?
Result 7: Why fine-tuning doesn't perform at par with rewinding?
Conclusion and Scope for Further Work
...and 7 more sections

Figures (32)

Figure 1: Training loss and test accuracy at different levels of IMP-WR. Left: Training loss. Right: Test accuracy.
Figure 2: Comparison of training loss and test accuracy between $W_{(10)}^{(min\_(10))}$, $W^{(one\_shot)}_{(10)}$, $W^{(FT)}_{(10)}$, $W^{(RIPN)}_{(10)}$, $W^{(RPN\_1)}_{(10)}$ and $W^{(RPN\_2)}_{(10)}$. Left: Training loss. Right: Test accuracy.
Figure 3: Distance from $W^{Pr{(rewind\_point)}}_{(L)}$ to $W^{Pr{(min\_(L-1))}}_{(L)}$ and to $W_{(L)}^{(min\_(L))}$ for $L$ ranging from $1$ to $10$.
Figure 4: Comparison of logarithm of training loss versus epoch between level $(L)$ and level $(L-1)$ projected on level $(L)$ for $L$ ranging from $1$ to $10$.
Figure 5: Comparison of top-100 positive eigen values of the Hessian at $W_{(L)}^{(min\_(L))}$ and $W^{Pr{(min\_(L-1))}}_{(L)}$ for $L$ ranging from $1$ to $10$. The figure shows that the eigen values of the Hessian at $W_{(L)}^{(min\_(L))}$ are smaller than that at $W^{Pr{(min\_(L-1))}}_{(L)}$. And smaller the eigen values, the smaller their product will be, and the larger would be the volume of the basin around the minimum.
...and 27 more figures

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

TL;DR

Abstract

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (32)