Table of Contents
Fetching ...

prunAdag: an adaptive pruning-aware gradient method

Margherita Porcelli, Giovanni Seraghiti, Philippe L. Toint

TL;DR

The paper addresses efficient optimization for problems where pruning of parameters is desirable by introducing prunAdag, a pruning-aware adaptive gradient method within the Objective-Function-Free Optimization framework. It advances the relevant/irrelevant paradigm by partitioning variables into optimisable and decreasable sets, using an Adagrad-like update for optimisable components and a trust-region–style decrease for decreasable ones, with a convergence guarantee of $\mathcal{O}\left(\frac{\log(k)}{\sqrt{k+1}}\right)$. Theoretical analysis under standard assumptions demonstrates global convergence, while extensive experiments across random under-determined LS, SPARCO sparse recovery, dictionary learning, and binary classification show that prunAdag is robust to pruning and often outperforms Adagrad and FW variants in pruning-heavy regimes. The results suggest practical impact for training models where post-training sparsification is important, offering a principled approach to maintain accuracy while achieving high sparsity; future work includes stochastic extensions and momentum enhancements.

Abstract

A pruning-aware adaptive gradient method is proposed which classifies the variables in two sets before updating them using different strategies. This technique extends the ``relevant/irrelevant" approach of Ding (2019) and Zimmer et al. (2022) and allows a posteriori sparsification of the solution of model parameter fitting problems. The new method is proved to be convergent with a global rate of decrease of the averaged gradient's norm of the form $\calO(\log(k)/\sqrt{k+1})$. Numerical experiments on several applications show that it is competitive.

prunAdag: an adaptive pruning-aware gradient method

TL;DR

The paper addresses efficient optimization for problems where pruning of parameters is desirable by introducing prunAdag, a pruning-aware adaptive gradient method within the Objective-Function-Free Optimization framework. It advances the relevant/irrelevant paradigm by partitioning variables into optimisable and decreasable sets, using an Adagrad-like update for optimisable components and a trust-region–style decrease for decreasable ones, with a convergence guarantee of . Theoretical analysis under standard assumptions demonstrates global convergence, while extensive experiments across random under-determined LS, SPARCO sparse recovery, dictionary learning, and binary classification show that prunAdag is robust to pruning and often outperforms Adagrad and FW variants in pruning-heavy regimes. The results suggest practical impact for training models where post-training sparsification is important, offering a principled approach to maintain accuracy while achieving high sparsity; future work includes stochastic extensions and momentum enhancements.

Abstract

A pruning-aware adaptive gradient method is proposed which classifies the variables in two sets before updating them using different strategies. This technique extends the ``relevant/irrelevant" approach of Ding (2019) and Zimmer et al. (2022) and allows a posteriori sparsification of the solution of model parameter fitting problems. The new method is proved to be convergent with a global rate of decrease of the averaged gradient's norm of the form . Numerical experiments on several applications show that it is competitive.

Paper Structure

This paper contains 17 sections, 3 theorems, 22 equations, 8 figures, 3 tables.

Key Result

Lemma 2.1

Suppose that AS.1 and AS.2 hold. Then we have that, for all $j\ge0$, and

Figures (8)

  • Figure 1: Effect of introducing the class of optimisable parameters within the minimization framework of prunAdag. On the right, (a) gradient norm (solid line) on the left $y$-axis and percentage of parameters below $\delta=10^{-3}$ (dotted line) on the right $y$-axes along the iterations. (b) Error measures $\rho$ (continuous) and $\omega$ (dashed) for different percentages of pruned components after the optimization.(Random least-squares A3)
  • Figure 2: Dynamic of parameters' classification in the sets $\mathcal{O}_k$, $\mathcal{A}_k$, and $\mathcal{D}_k$ at each iteration. (Random least-squares A1)
  • Figure 3: Norm of the gradient on the left and percentage of parameters below $\delta=10^{-3}$ on the right along the iterations for prunAdag-V2 and prunAdag-V3 and different target numbers of relevant parameters $T$. (Random least-squares A1)
  • Figure 4: On top, (a) gradient norm and (b) percentage of components below a fixed threshold $\delta=10^{-3}$ along iterations; at the bottom, (c) error measure $\omega$ and (d) error measure $\rho$ for different percentages of pruned components after the optimization. (Random least-squares A2)
  • Figure 5: On top, (a) gradient norm and (b) percentage of components below the threshold $\delta=10^{-3}$ along iterations; at the bottom, (c) error measure $\omega$ and (d) error measure $\rho$ for different percentages of pruned components $\sigma$ after the optimization. (Sparco 11)
  • ...and 3 more figures

Theorems & Definitions (3)

  • Lemma 2.1
  • Lemma 2.2
  • Theorem 2.3