Table of Contents
Fetching ...

Phase Transitions in Neural Networks Pruning

Diego Pesce, Yang-Hui He, Guido Caldarelli

TL;DR

Focusing on magnitude-based pruning with fine-tuning, this work shows that deep networks undergo a sharp transition from a cooperative, functional phase to a disordered phase with collapsed performance, and suggests universal pruning-induced criticality across architectures and datasets.

Abstract

Deep neural networks are strongly over-parameterized, often containing far more weights than required for their task. Although such redundancy can aid optimization, it leads to inefficient deployment and high computational cost, motivating model compression techniques. Among these, network pruning provides a clear and effective route to sparsity. We study pruning from a statistical-physics perspective, interpreting performance degradation under weight removal as a phase transition. Focusing on magnitude-based pruning with fine-tuning, we show that deep networks undergo a sharp transition from a cooperative, functional phase to a disordered phase with collapsed performance. This transition is characterized by scaling laws consistent with second-order critical behavior, with connectivity as the control parameter. Our findings suggest universal pruning-induced criticality across architectures and datasets. Finally, we show that there exists a large class of subnetworks sharing the same nodes' degrees with similar learning ability, thus linking model performance to its topological properties.

Phase Transitions in Neural Networks Pruning

TL;DR

Focusing on magnitude-based pruning with fine-tuning, this work shows that deep networks undergo a sharp transition from a cooperative, functional phase to a disordered phase with collapsed performance, and suggests universal pruning-induced criticality across architectures and datasets.

Abstract

Deep neural networks are strongly over-parameterized, often containing far more weights than required for their task. Although such redundancy can aid optimization, it leads to inefficient deployment and high computational cost, motivating model compression techniques. Among these, network pruning provides a clear and effective route to sparsity. We study pruning from a statistical-physics perspective, interpreting performance degradation under weight removal as a phase transition. Focusing on magnitude-based pruning with fine-tuning, we show that deep networks undergo a sharp transition from a cooperative, functional phase to a disordered phase with collapsed performance. This transition is characterized by scaling laws consistent with second-order critical behavior, with connectivity as the control parameter. Our findings suggest universal pruning-induced criticality across architectures and datasets. Finally, we show that there exists a large class of subnetworks sharing the same nodes' degrees with similar learning ability, thus linking model performance to its topological properties.
Paper Structure (6 equations, 4 figures)

This paper contains 6 equations, 4 figures.

Figures (4)

  • Figure 1: (color on line) Detail of the behavior near the point at which the algorithm ceases to operate. The x axis shows the percentage of links remaining after pruning. From top to bottom: accuracy (black line), prediction entropy (red line) and outcome entropy (green line). Results are shown for the FC architecture using (a) MNIST and (b) KMNIST.
  • Figure 2: Evolution of the weights over multiple magnitude-based pruning steps. The y axis shows the values of the remaining parameters, plotted against their values at the previous timestep (x axis). Red points denote pruned parameters, while blue points show surviving weights. The deviation from the identity line indicates weight evolution between steps.
  • Figure 3: The plot shows the various order parameters compared and the corresponding evolution metrics for weights. It is evident that the region of stronger reorganization is the one signing the lost of model performances.
  • Figure 4: (color on line) Here we show the loss in the training on y-axis with respect to the steps of training on x axis. From top to down we have (a; orange online) totally random subset of edges, (b; blue online) subset of edges self-organised in the pruning procedure and then randomized, (c; green online) the original model