Table of Contents
Fetching ...

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

Tausifa Jan Saleem, Ramanjit Ahuja, Surendra Prasad, Brejesh Lall

TL;DR

This work empirically studies the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process to provide insights into phenomena like pruning of smaller magnitude weights and the role of the iterative process.

Abstract

Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

TL;DR

This work empirically studies the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process to provide insights into phenomena like pruning of smaller magnitude weights and the role of the iterative process.

Abstract

Lottery ticket hypothesis for deep neural networks emphasizes the importance of initialization used to re-train the sparser networks obtained using the iterative magnitude pruning process. An explanation for why the specific initialization proposed by the lottery ticket hypothesis tends to work better in terms of generalization (and training) performance has been lacking. Moreover, the underlying principles in iterative magnitude pruning, like the pruning of smaller magnitude weights and the role of the iterative process, lack full understanding and explanation. In this work, we attempt to provide insights into these phenomena by empirically studying the volume/geometry and loss landscape characteristics of the solutions obtained at various stages of the iterative magnitude pruning process.
Paper Structure (22 sections, 7 equations, 32 figures, 9 tables)

This paper contains 22 sections, 7 equations, 32 figures, 9 tables.

Figures (32)

  • Figure 1: Training loss and test accuracy at different levels of IMP-WR. Left: Training loss. Right: Test accuracy.
  • Figure 2: Comparison of training loss and test accuracy between $W_{(10)}^{(min\_(10))}$, $W^{(one\_shot)}_{(10)}$, $W^{(FT)}_{(10)}$, $W^{(RIPN)}_{(10)}$, $W^{(RPN\_1)}_{(10)}$ and $W^{(RPN\_2)}_{(10)}$. Left: Training loss. Right: Test accuracy.
  • Figure 3: Distance from $W^{Pr{(rewind\_point)}}_{(L)}$ to $W^{Pr{(min\_(L-1))}}_{(L)}$ and to $W_{(L)}^{(min\_(L))}$ for $L$ ranging from $1$ to $10$.
  • Figure 4: Comparison of logarithm of training loss versus epoch between level $(L)$ and level $(L-1)$ projected on level $(L)$ for $L$ ranging from $1$ to $10$.
  • Figure 5: Comparison of top-100 positive eigen values of the Hessian at $W_{(L)}^{(min\_(L))}$ and $W^{Pr{(min\_(L-1))}}_{(L)}$ for $L$ ranging from $1$ to $10$. The figure shows that the eigen values of the Hessian at $W_{(L)}^{(min\_(L))}$ are smaller than that at $W^{Pr{(min\_(L-1))}}_{(L)}$. And smaller the eigen values, the smaller their product will be, and the larger would be the volume of the basin around the minimum.
  • ...and 27 more figures