prunAdag: an adaptive pruning-aware gradient method

Margherita Porcelli; Giovanni Seraghiti; Philippe L. Toint

prunAdag: an adaptive pruning-aware gradient method

Margherita Porcelli, Giovanni Seraghiti, Philippe L. Toint

TL;DR

The paper addresses efficient optimization for problems where pruning of parameters is desirable by introducing prunAdag, a pruning-aware adaptive gradient method within the Objective-Function-Free Optimization framework. It advances the relevant/irrelevant paradigm by partitioning variables into optimisable and decreasable sets, using an Adagrad-like update for optimisable components and a trust-region–style decrease for decreasable ones, with a convergence guarantee of $\mathcal{O}\left(\frac{\log(k)}{\sqrt{k+1}}\right)$. Theoretical analysis under standard assumptions demonstrates global convergence, while extensive experiments across random under-determined LS, SPARCO sparse recovery, dictionary learning, and binary classification show that prunAdag is robust to pruning and often outperforms Adagrad and FW variants in pruning-heavy regimes. The results suggest practical impact for training models where post-training sparsification is important, offering a principled approach to maintain accuracy while achieving high sparsity; future work includes stochastic extensions and momentum enhancements.

Abstract

A pruning-aware adaptive gradient method is proposed which classifies the variables in two sets before updating them using different strategies. This technique extends the ``relevant/irrelevant" approach of Ding (2019) and Zimmer et al. (2022) and allows a posteriori sparsification of the solution of model parameter fitting problems. The new method is proved to be convergent with a global rate of decrease of the averaged gradient's norm of the form $\calO(\log(k)/\sqrt{k+1})$. Numerical experiments on several applications show that it is competitive.

prunAdag: an adaptive pruning-aware gradient method

TL;DR

Abstract

prunAdag: an adaptive pruning-aware gradient method

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)