Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing

Vishnu Asutosh Dasu; Md Rafi ur Rashid; Vipul Gupta; Saeid Tizpaz-Niari; Gang Tan

Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing

Vishnu Asutosh Dasu, Md Rafi ur Rashid, Vipul Gupta, Saeid Tizpaz-Niari, Gang Tan

TL;DR

This work addresses fairness in large language models by post-processing pruning of attention heads. It introduces Attention Pruning (AP), which uses surrogate DNNs to predict how pruning head subsets will affect bias and perplexity and then applies surrogate-guided simulated annealing to identify effective head subsets. AP achieves substantial bias reduction (up to $40\%$ in gender bias) while maintaining model utility and outperforms state-of-the-art pruning methods across six diverse LLMs, demonstrating scalable fairness repair for billion-parameter models. The approach also shows beneficial spillovers to other social biases, highlighting the practical impact of a post-processing, scalable fairness remedy for modern LLM deployments.

Abstract

This paper explores pruning attention heads as a post-processing bias mitigation method for large language models (LLMs). Modern AI systems such as LLMs are expanding into sensitive social contexts where fairness concerns become especially crucial. Since LLMs develop decision-making patterns by training on massive datasets of human-generated content, they naturally encode and perpetuate societal biases. While modifying training datasets and algorithms is expensive and requires significant resources; post-processing techniques-such as selectively deactivating neurons and attention heads in pre-trained LLMs-can provide feasible and effective approaches to improve fairness. However, identifying the optimal subset of parameters to prune presents a combinatorial challenge within LLMs' immense parameter space, requiring solutions that efficiently balance competing objectives across the frontiers of model fairness and utility. To address the computational challenges, we explore a search-based program repair approach via randomized simulated annealing. Given the prohibitive evaluation costs in billion-parameter LLMs, we develop surrogate deep neural networks that efficiently model the relationship between attention head states (active/inactive) and their corresponding fairness/utility metrics. This allows us to perform optimization over the surrogate models and efficiently identify optimal subsets of attention heads for selective pruning rather than directly searching through the LLM parameter space. This paper introduces Attention Pruning, a fairness-aware surrogate simulated annealing approach to prune attention heads in LLMs that disproportionately contribute to bias while minimally impacting overall model utility. Our experiments show that Attention Pruning achieves up to $40\%$ reduction in gender bias and outperforms the state-of-the-art bias mitigation strategies.

Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing

TL;DR

Abstract

Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)

Theorems & Definitions (1)