FAIRM: Learning invariant representations for algorithmic fairness and domain generalization with minimax optimality

Sai Li; Linjun Zhang

FAIRM: Learning invariant representations for algorithmic fairness and domain generalization with minimax optimality

Sai Li, Linjun Zhang

TL;DR

FAIRM tackles distribution shifts by enforcing invariance and fairness across training environments to achieve robust OOD performance. It introduces a full-information invariant oracle and a training-environment counterpart, with a special focus on a diversity-type condition that enables recovery of the full-information benchmark from training data. The paper provides finite-sample guarantees for empirical FAIRM, develops a computationally efficient linear-model algorithm with minimax-optimal domain generalization, and demonstrates superior performance on synthetic data and Color MNIST. Its theoretical guarantees encompass both domain generalization and multi-calibration, offering a practical, distribution-free approach to fair and generalizable learning. The work highlights FAIRM’s potential extensions to nonlinear representations and broader invariant-learning paradigms.

Abstract

Machine learning methods often assume that the test data have the same distribution as the training data. However, this assumption may not hold due to multiple levels of heterogeneity in applications, raising issues in algorithmic fairness and domain generalization. In this work, we address the problem of fair and generalizable machine learning by invariant principles. We propose a training environment-based oracle, FAIRM, which has desirable fairness and domain generalization properties under a diversity-type condition. We then provide an empirical FAIRM with finite-sample theoretical guarantees under weak distributional assumptions. We then develop efficient algorithms to realize FAIRM in linear models and demonstrate the nonasymptotic performance with minimax optimality. We evaluate our method in numerical experiments with synthetic data and MNIST data and show that it outperforms its counterparts.

FAIRM: Learning invariant representations for algorithmic fairness and domain generalization with minimax optimality

TL;DR

Abstract

Paper Structure (21 sections, 10 theorems, 35 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 10 theorems, 35 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Problem description
Related works
Our contributions
Organization and Notation
Full-information invariant oracle and its properties
Full-information FAIRM
OOD performance of full-info FAIRM
Training environment-based invariant framework
A training environment-based invariant oracle
Comparison to the prior arts
Empirical FAIRM
FAIRM in linear models
FAIRM in linear models
Theoretical properties
...and 6 more sections

Key Result

Proposition 1

If it further holds that $\mathbbm{E}[y^e|\Phi(\bm{x}^e)=\bm\phi]=w(\bm\phi)\in\mathcal{F}_{w}$ for all $\Phi\in\mathcal{I}^*_{\Phi}$, then

Figures (4)

Figure 1: Performance of four methods with $\bm \Sigma_z=I_p$ and $\delta=0.6$. "ERM" denotes the single-task lasso based on the training data. "FAIRM" denotes Algorithm \ref{['alg1']}. "MM" denotes the Maximin method based on the training data. The "Oracle" method is the single-task Lasso based on $\{X^e_{.,S_v^c},\bm{y}^e\}_{e\in\mathcal{E}_{tr}}$. Each boxplot is based on 200 independent replications.
Figure 2: Performance of four methods with equi-correlated $\bm \Sigma_z$ and $\delta=0.4$. "ERM" denotes the single-task lasso based on the training data. "FAIRM" denotes Algorithm \ref{['alg1']}. "MM" denotes the Maximin method based on the training data. The "Oracle" method is the single-task Lasso based on $\{X^e_{.,S_v^c},\bm{y}^e\}_{e\in\mathcal{E}_{tr}}$. Each boxplot is based on 200 independent replications.
Figure 3: Left and middle top: samples with different labels can have different frame colors, showing that the labels and frame colors are spuriously correlated. Right top: The oracle invariant set excluding frame features (black) and the estimated nonzero correlations within the oracle set (white). Bottom: The estimated nonzero correlations given by FAIRM (left), ERM (middle), and Maximin (right).
Figure 4: Left and middle top: samples with different labels can have different frame colors, showing that the labels and frame colors are spuriously correlated. Right top: The oracle invariant set excluding frame features (black) and the estimated nonzero correlations within the oracle set (white). Bottom: The estimated nonzero correlations given by FAIRM (left), ERM (middle), and Maximin (right).

Theorems & Definitions (15)

Definition 1: Multi-calibration (adapted from hebert2018multicalibration)
Remark 1: Feasibility of full-info FAIRM
Example 1
Proposition 1: OOD performance of full-info FAIRM
Proposition 2: FAIRM recovers the full-information benchmark
Remark 2
Proposition 3: Risks for domain generalization
Proposition 4: Multi-calibration bias
Theorem 1: Domain generalization of empirical FAIRM
Theorem 2: Multi-calibration of empirical FAIRM
...and 5 more

FAIRM: Learning invariant representations for algorithmic fairness and domain generalization with minimax optimality

TL;DR

Abstract

FAIRM: Learning invariant representations for algorithmic fairness and domain generalization with minimax optimality

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)