Table of Contents
Fetching ...

On Fairness of Task Arithmetic: The Role of Task Vectors

Hiroki Naganuma, Kotaro Yoshida, Laura Gomezjurado Gonzalez, Takafumi Horie, Yuji Naraki, Ryotaro Shimizu

TL;DR

This paper analyzes the fairness implications of task arithmetic using task vectors, comparing it to full fine-tuning (FFT) and Low-Rank Adaptation (LoRA) across hate-speech and toxicity NLP tasks and a vision age‑classification task. It demonstrates that a single global scalar $\lambda$ controlling merged subgroup task vectors can navigate the accuracy–fairness trade‑off, and it provides a theoretical bound linking $\lambda$‑driven deviations to Demographic Parity Difference ($DPD$) and Equalized Odds Difference ($EOD$). It also shows that merging subgroup vectors offers a practical mechanism to steer fairness outcomes, while subgroup‑targeted edits reveal nuanced, group‑dependent effects. Together these results position task arithmetic as both a cost‑efficient editing method and a fairness‑aware alternative to FFT/LoRA for standard group‑fair classification settings, with implications for responsible deployment of large language models.

Abstract

Model editing techniques, particularly task arithmetic with task vectors, offer an efficient alternative to full fine-tuning by enabling direct parameter updates through simple arithmetic operations. While this approach promises substantial computational savings, its impact on fairness has remained largely unexplored -- despite growing concern over biased outcomes in high-stakes applications such as hate speech detection. In this work, we present the first systematic study of group fairness in task arithmetic within this binary text and image classification regime, comparing it against full fine-tuning (FFT) and Low-Rank Adaptation (LoRA). We evaluate across multiple language models and datasets using standard group fairness metrics, including Demographic Parity and Equalized Odds. Our analysis shows that task vectors can be tuned to achieve competitive accuracy while reducing disparities, and that merging subgroup-specific task vectors provides a practical mechanism for steering fairness outcomes. We further provide a theoretical bound linking task vector scaling to fairness metrics, offering insight into the observed trade-offs. Together, these findings establish task arithmetic not only as a cost-efficient editing method but also as a fairness-aware alternative to existing adaptation techniques, within the standard group-fair classification setting, laying the groundwork for responsible deployment of large language models.

On Fairness of Task Arithmetic: The Role of Task Vectors

TL;DR

This paper analyzes the fairness implications of task arithmetic using task vectors, comparing it to full fine-tuning (FFT) and Low-Rank Adaptation (LoRA) across hate-speech and toxicity NLP tasks and a vision age‑classification task. It demonstrates that a single global scalar controlling merged subgroup task vectors can navigate the accuracy–fairness trade‑off, and it provides a theoretical bound linking ‑driven deviations to Demographic Parity Difference () and Equalized Odds Difference (). It also shows that merging subgroup vectors offers a practical mechanism to steer fairness outcomes, while subgroup‑targeted edits reveal nuanced, group‑dependent effects. Together these results position task arithmetic as both a cost‑efficient editing method and a fairness‑aware alternative to FFT/LoRA for standard group‑fair classification settings, with implications for responsible deployment of large language models.

Abstract

Model editing techniques, particularly task arithmetic with task vectors, offer an efficient alternative to full fine-tuning by enabling direct parameter updates through simple arithmetic operations. While this approach promises substantial computational savings, its impact on fairness has remained largely unexplored -- despite growing concern over biased outcomes in high-stakes applications such as hate speech detection. In this work, we present the first systematic study of group fairness in task arithmetic within this binary text and image classification regime, comparing it against full fine-tuning (FFT) and Low-Rank Adaptation (LoRA). We evaluate across multiple language models and datasets using standard group fairness metrics, including Demographic Parity and Equalized Odds. Our analysis shows that task vectors can be tuned to achieve competitive accuracy while reducing disparities, and that merging subgroup-specific task vectors provides a practical mechanism for steering fairness outcomes. We further provide a theoretical bound linking task vector scaling to fairness metrics, offering insight into the observed trade-offs. Together, these findings establish task arithmetic not only as a cost-efficient editing method but also as a fairness-aware alternative to existing adaptation techniques, within the standard group-fair classification setting, laying the groundwork for responsible deployment of large language models.

Paper Structure

This paper contains 56 sections, 4 theorems, 30 equations, 11 figures, 5 tables.

Key Result

Lemma 1

Let $\ell(\theta;x)$ be the training loss. For any non-negative $\{\lambda_g\}$, That is, task addition gives the first‐order solution of a group-weighted ERM.

Figures (11)

  • Figure 1: LoRA and FFT vs. Task addition with the optimal coefficient for the training accuracy ($\lambda = 0.8$ for gender setting and $\lambda = 0.5$ for race setting) on group-wise accuracy, demographic parity difference (DPD, lower is fairer), and equalized odds difference (EOD, lower is fairer). Error bars denote the standard error across three seeds. Columns: group-wise accuracy, DPD, EOD. No consistent pattern emerges indicating that task addition systematically degrades subgroup fairness relative to LoRA or FFT. While some subgroups show improvements or comparable results under task addition, others exhibit small declines.
  • Figure 2: Varying the task arithmetic coefficient $\lambda$ and comparing against FFT (dashed baseline blue line) and LoRA (orange dashed) for macro-averaged accuracy (left), demographic parity difference (DPD, center), and equalized odds difference (EOD, right) on the gender subset. Higher accuracy is better; lower DPD/EOD indicate improved group fairness. For $\lambda \gtrsim 0.3$, task addition maintains competitive accuracy while typically lowering DPD/EOD relative to both baselines.
  • Figure 3: Heatmaps of Accuracy (left), DPD (center), and EOD (right) for Men (top) and Women (bottom) subgroups under the baseline FFT model ($\lambda = 0.0$) and with increasing $\lambda$ values from 0.2 to 1.0 in 0.2 increments. The task vector for Men was added on the gender subset (top), and the task vector for Women was added on the gender subset (bottom). Darker cells indicate higher values on each metric’s scale; for DPD/EOD, lower values are better.
  • Figure 4: Impact of injecting the Men (a) and Women (b) subgroup task vectors into the FFT model on the gender data subset. The plot illustrates how scaling coefficient $\lambda$ reduces DPD and EOD, outperforming the baseline FFT (blue dashed) and LoRA (orange dashed), with negligible impact on macro-averaged accuracy.
  • Figure 5: Boxplots of group-wise accuracy, demographic parity difference (DPD), and equalized odds difference (EOD) for FFT, LoRA, and task addition with coefficient ($\lambda = 0.8$) evaluated on the gender subset of the data. Higher accuracy is desirable, whereas lower DPD and EOD values indicate improved fairness. Boxplots show medians, interquartile ranges, and variability (with standard error across three seeds). While accuracy is similar across methods, Task Addition generally yields lower DPD and EOD medians than FFT and LoRA, suggesting a better balance between performance and fairness, though overlapping distributions imply these differences are not uniformly significant.
  • ...and 6 more figures

Theorems & Definitions (8)

  • Lemma 1: First‐order link
  • proof
  • Proposition 1: DPD bound
  • proof
  • Lemma 2
  • proof
  • Proposition 2: EOD bound
  • proof