Table of Contents
Fetching ...

Set to Be Fair: Demographic Parity Constraints for Set-Valued Classification

Eyal Cohen, Christophe Denis, Mohamed Hebiri

TL;DR

This work addresses set-valued multiclass prediction under Demographic Parity (DP) and an expected-size constraint by deriving a closed-form DP-fair optimal classifier and two practical algorithms. The first is an oracle-based plug-in approach that minimizes risk subject to DP and size via a Lagrangian thresholding rule, with distribution-free convergence guarantees. The second is a computationally efficient two-step correction that enforces DP on a size-constrained predictor, also with DP and size guarantees and favorable scalability. Empirical results on synthetic and real data show that the DP-fair predictors achieve low unfairness with modest risk increases, while the two-step method substantially reduces computation time, particularly as the number of classes grows. Overall, the framework offers principled, interpretable, and scalable fair set-valued predictions that leverage unlabeled data for constraint satisfaction.

Abstract

Set-valued classification is used in multiclass settings where confusion between classes can occur and lead to misleading predictions. However, its application may amplify discriminatory bias motivating the development of set-valued approaches under fairness constraints. In this paper, we address the problem of set-valued classification under demographic parity and expected size constraints. We propose two complementary strategies: an oracle-based method that minimizes classification risk while satisfying both constraints, and a computationally efficient proxy that prioritizes constraint satisfaction. For both strategies, we derive closed-form expressions for the (optimal) fair set-valued classifiers and use these to build plug-in, data-driven procedures for empirical predictions. We establish distribution-free convergence rates for violations of the size and fairness constraints for both methods, and under mild assumptions we also provide excess-risk bounds for the oracle-based approach. Empirical results demonstrate the effectiveness of both strategies and highlight the efficiency of our proxy method.

Set to Be Fair: Demographic Parity Constraints for Set-Valued Classification

TL;DR

This work addresses set-valued multiclass prediction under Demographic Parity (DP) and an expected-size constraint by deriving a closed-form DP-fair optimal classifier and two practical algorithms. The first is an oracle-based plug-in approach that minimizes risk subject to DP and size via a Lagrangian thresholding rule, with distribution-free convergence guarantees. The second is a computationally efficient two-step correction that enforces DP on a size-constrained predictor, also with DP and size guarantees and favorable scalability. Empirical results on synthetic and real data show that the DP-fair predictors achieve low unfairness with modest risk increases, while the two-step method substantially reduces computation time, particularly as the number of classes grows. Overall, the framework offers principled, interpretable, and scalable fair set-valued predictions that leverage unlabeled data for constraint satisfaction.

Abstract

Set-valued classification is used in multiclass settings where confusion between classes can occur and lead to misleading predictions. However, its application may amplify discriminatory bias motivating the development of set-valued approaches under fairness constraints. In this paper, we address the problem of set-valued classification under demographic parity and expected size constraints. We propose two complementary strategies: an oracle-based method that minimizes classification risk while satisfying both constraints, and a computationally efficient proxy that prioritizes constraint satisfaction. For both strategies, we derive closed-form expressions for the (optimal) fair set-valued classifiers and use these to build plug-in, data-driven procedures for empirical predictions. We establish distribution-free convergence rates for violations of the size and fairness constraints for both methods, and under mild assumptions we also provide excess-risk bounds for the oracle-based approach. Empirical results demonstrate the effectiveness of both strategies and highlight the efficiency of our proxy method.

Paper Structure

This paper contains 24 sections, 14 theorems, 101 equations, 8 figures, 1 algorithm.

Key Result

Theorem 2.2

Suppose Assumption assum:continuous is verified. Then the $\beta$-specific oracle $\Gamma^*_\beta$ is: with $\gamma^*_{k,s} = \alpha^*_{k,s} - \pi_s \sum_s \alpha^*_{k,s}$ and $\lambda^*$ and $\alpha^* = \left(\alpha_{k,s}\right)_{k \in [K], s\in \mathcal{S}}$ are the Lagrangian multiplier that are characterized as where $(\cdot)_+$ stands for the positive part.

Figures (8)

  • Figure 1: Results on synthetic data with estimated class-conditional probabilities (20 estimators).
  • Figure 2: Same comparison with estimated probabilities (20 estimators).
  • Figure 3: Runtime comparison for increasing $K$.
  • Figure 4: Stability comparison between the optimizer and the two-step method.
  • Figure 5: Results on the DRUG dataset (20 estimators, gradient boosting).
  • ...and 3 more figures

Theorems & Definitions (25)

  • Definition 2.1: DP-constraint
  • Theorem 2.2
  • Proposition 2.3
  • Corollary 2.4
  • Proposition 2.5
  • Theorem 3.1: Fairness and Expected size controls
  • Theorem 3.2: excess-risk control
  • Remark 4.1
  • Theorem 4.2
  • Lemma B.1
  • ...and 15 more