Table of Contents
Fetching ...

On The Fairness Impacts of Hardware Selection in Machine Learning

Sree Harsha Nelaturu, Nishaanth Kanna Ravichandran, Cuong Tran, Sara Hooker, Ferdinando Fioretto

TL;DR

The paper addresses how hardware tooling affects fairness in ML, introducing hardware sensitivity $\Delta(a,m)$ and fairness violation $\xi(D,m)$ to quantify disparities across demographic groups. It develops a theoretical framework where hardware-induced unfairness arises from differences in group gradient flows and group Hessian-based loss landscapes, and validates these insights with extensive experiments across GPUs, datasets, and architectures. A practical mitigation is proposed: augmenting training loss with a term that aligns distance-to-decision-boundary across groups, which substantially reduces fairness violations while preserving overall performance. The work highlights that deployment hardware can alter model equity and provides actionable guidelines to evaluate and mitigate these effects, urging careful cross-hardware reporting and robust training strategies for fair ML in heterogeneous hardware environments.

Abstract

In the machine learning ecosystem, hardware selection is often regarded as a mere utility, overshadowed by the spotlight on algorithms and data. This oversight is particularly problematic in contexts like ML-as-a-service platforms, where users often lack control over the hardware used for model deployment. How does the choice of hardware impact generalization properties? This paper investigates the influence of hardware on the delicate balance between model performance and fairness. We demonstrate that hardware choices can exacerbate existing disparities, attributing these discrepancies to variations in gradient flows and loss surfaces across different demographic groups. Through both theoretical and empirical analysis, the paper not only identifies the underlying factors but also proposes an effective strategy for mitigating hardware-induced performance imbalances.

On The Fairness Impacts of Hardware Selection in Machine Learning

TL;DR

The paper addresses how hardware tooling affects fairness in ML, introducing hardware sensitivity and fairness violation to quantify disparities across demographic groups. It develops a theoretical framework where hardware-induced unfairness arises from differences in group gradient flows and group Hessian-based loss landscapes, and validates these insights with extensive experiments across GPUs, datasets, and architectures. A practical mitigation is proposed: augmenting training loss with a term that aligns distance-to-decision-boundary across groups, which substantially reduces fairness violations while preserving overall performance. The work highlights that deployment hardware can alter model equity and provides actionable guidelines to evaluate and mitigate these effects, urging careful cross-hardware reporting and robust training strategies for fair ML in heterogeneous hardware environments.

Abstract

In the machine learning ecosystem, hardware selection is often regarded as a mere utility, overshadowed by the spotlight on algorithms and data. This oversight is particularly problematic in contexts like ML-as-a-service platforms, where users often lack control over the hardware used for model deployment. How does the choice of hardware impact generalization properties? This paper investigates the influence of hardware on the delicate balance between model performance and fairness. We demonstrate that hardware choices can exacerbate existing disparities, attributing these discrepancies to variations in gradient flows and loss surfaces across different demographic groups. Through both theoretical and empirical analysis, the paper not only identifies the underlying factors but also proposes an effective strategy for mitigating hardware-induced performance imbalances.
Paper Structure (22 sections, 9 theorems, 24 equations, 16 figures, 1 table)

This paper contains 22 sections, 9 theorems, 24 equations, 16 figures, 1 table.

Key Result

Theorem 4.1

Given reference hardware $m$, the hardware sensitivity$\Delta(a,m)$ of group $a \in \mathcal{A}$ is upper bounded by: where $\rho(m) = \max_{m \in \mathcal{M}} \| {\bm{\theta}}^*_m -{\bm{\theta}}^*_{m'}\|_2$, $\bm{g}_{a,m}^\ell = \nabla_{{\bm{\theta}}} J( \mathrel{\mathop{{\bm{\theta}}}\limits^{\hbox{\ex@$\hbox{$\star$}$}}}_m; D_{a})$, and $\bm{H}_{a,m}^{\ell} = \nabla^2_{{\bm{\theta}}} J(\mathre

Figures (16)

  • Figure 1: A model (ResNet34) with the same parameters (random seeds, epochs, batch-size) on different hardware can have vastly different performance results, especially for minority groups (dark colors). The reference hardware is T4. Left: UTK-Face, Right: CIFAR-10.
  • Figure 2: Illustration of the three main components in Theorem \ref{['thm:taylor']}. Left: Difference in model parameter $\rho(m) = \max_m' \| {\bm{\theta}}^{*}_{m} -{\bm{\theta}}^{*}_{m'}\|_2$ when $m=T4$. Middle: Gradient flows $\|\bm{g}^\ell_a\|$ on T4 for five races. Right: Hessian max eigenvalues $\lambda(\bm{H}^{\ell}_{a,m})$ on T4 for five races.
  • Figure 3: Illustration of Impact of group size on Gradient Norm Imbalance as shown in Theorem \ref{['thm:3']}. Left: Group size used in training for five races. Middle: Gradient norms $\bm{g}_a$ averaged across three devices for five races and 10 seeds each. Right: Hardware Sensitivity; Notice higher sensitivity as the group size decreases.
  • Figure 4: Illustration on the impact of group Hessians. Group 'a' has a smaller Hessian compared to Group 'b', resulting in lower sensitivity of the loss function for Group 'a'.
  • Figure 5: The relationship between Hessian norm and distance to the decision boundary.
  • ...and 11 more figures

Theorems & Definitions (13)

  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Proposition 4.5
  • Theorem 1.1
  • proof
  • Theorem 1.2
  • proof
  • Theorem 1.3
  • ...and 3 more