Table of Contents
Fetching ...

Balancing Act: Constraining Disparate Impact in Sparse Models

Meraj Hashemizadeh, Juan Ramirez, Rohan Sukumaran, Golnoosh Farnadi, Simon Lacoste-Julien, Jose Gallego-Posada

TL;DR

This work proposes a constrained optimization approach that directly addresses the disparate impact of pruning: the formulation bounds the accuracy change between the dense and sparse models, for each sub-group, to determine if a pruned model achieves acceptable disparity levels.

Abstract

Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that directly addresses the disparate impact of pruning: our formulation bounds the accuracy change between the dense and sparse models, for each sub-group. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.

Balancing Act: Constraining Disparate Impact in Sparse Models

TL;DR

This work proposes a constrained optimization approach that directly addresses the disparate impact of pruning: the formulation bounds the accuracy change between the dense and sparse models, for each sub-group, to determine if a pruned model achieves acceptable disparity levels.

Abstract

Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that directly addresses the disparate impact of pruning: our formulation bounds the accuracy change between the dense and sparse models, for each sub-group. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.
Paper Structure (50 sections, 15 equations, 11 figures, 35 tables, 3 algorithms)

This paper contains 50 sections, 15 equations, 11 figures, 35 tables, 3 algorithms.

Figures (11)

  • Figure 1: Left: A dense model is sparsified with GMP, and then subjected to either (i) naive fine-tuning ( NFT, using ERM), (ii) equalized loss constraints tran2022pruning, or (iii) our approach ( CEAG). Right: Positive (resp. negative) excess accuracy gaps (EAGs, § \ref{['sec:eag']}) indicate groups whose performance degraded more (resp. less) than the model's overall accuracy change. Models with low disparate impact have EAGs that concentrate around zero. CEAG consistenly yields models with lower disparity ($\Psi_{\text{PW}}$, § \ref{['sec:eag']}) than NFT and EL. For example, NFT yields a 10% hyper-degradation (EAG, $\psi_g$) on group Others. Results correspond to race prediction on UTKFace, with race as group attribute at 90% sparsity. Metrics are measured on the training set and averaged over 5 seeds.
  • Figure 2: Trade-off between disparity and accuracy for UTKFace race prediction with race as group attribute. NFT and EL+RB yield models with high disparity. In contrast, CEAG consistently produces models that mitigate the disparate impact of pruning.CEAG's gains do not entail a degradation in overall test accuracy. Vertical dashed lines indicate the tolerance ($\epsilon$) of our method, with colors corresponding to different sparsity levels.
  • Figure 3: Evolution of disparate impact of pruning ($\Psi_{\text{PW}}$) during training under \ref{['eq:constrained_pairwise']}. Left: UTKFace dataset at 92.5% sparsity. Right: CIFAR-100 dataset at 95% sparsity. The horizontal dashed lines indicate the tolerance ($\epsilon$) of 5% and 10%, respectively.
  • Figure 4: Effects of replay buffers on the multiplier dynamics on CIFAR-100 under 90% sparsity. As expected, the multiplier exhibits notably smoother dynamics when using replay buffers.
  • Figure 5: UTKFace gender prediction with race as protected attribute.
  • ...and 6 more figures