Table of Contents
Fetching ...

Enhancing Group Fairness in Online Settings Using Oblique Decision Forests

Somnath Basu Roy Chowdhury, Nicholas Monath, Ahmad Beirami, Rahul Kidambi, Avinava Dubey, Amr Ahmed, Snigdha Chaturvedi

TL;DR

The paper targets online group fairness, addressing the difficulty of enforcing demographic parity when data arrives one sample at a time. It introduces Aranyani, an ensemble of soft-routed oblique decision trees that isolate node-level parameters and use aggregate statistics to estimate fairness-gradients online without storing past data. The authors provide a gradient-based training framework with a Hubera-regularized node-level constraint, plus theoretical guarantees on DP bounds, gradient estimation, and convergence. Experiments across tabular, vision, and language benchmarks show Aranyani achieves superior accuracy-fairness trade-offs and notable efficiency gains over baselines, highlighting its practical potential for online, fairness-aware decision systems.

Abstract

Fairness, especially group fairness, is an important consideration in the context of machine learning systems. The most commonly adopted group fairness-enhancing techniques are in-processing methods that rely on a mixture of a fairness objective (e.g., demographic parity) and a task-specific objective (e.g., cross-entropy) during the training process. However, when data arrives in an online fashion -- one instance at a time -- optimizing such fairness objectives poses several challenges. In particular, group fairness objectives are defined using expectations of predictions across different demographic groups. In the online setting, where the algorithm has access to a single instance at a time, estimating the group fairness objective requires additional storage and significantly more computation (e.g., forward/backward passes) than the task-specific objective at every time step. In this paper, we propose Aranyani, an ensemble of oblique decision trees, to make fair decisions in online settings. The hierarchical tree structure of Aranyani enables parameter isolation and allows us to efficiently compute the fairness gradients using aggregate statistics of previous decisions, eliminating the need for additional storage and forward/backward passes. We also present an efficient framework to train Aranyani and theoretically analyze several of its properties. We conduct empirical evaluations on 5 publicly available benchmarks (including vision and language datasets) to show that Aranyani achieves a better accuracy-fairness trade-off compared to baseline approaches.

Enhancing Group Fairness in Online Settings Using Oblique Decision Forests

TL;DR

The paper targets online group fairness, addressing the difficulty of enforcing demographic parity when data arrives one sample at a time. It introduces Aranyani, an ensemble of soft-routed oblique decision trees that isolate node-level parameters and use aggregate statistics to estimate fairness-gradients online without storing past data. The authors provide a gradient-based training framework with a Hubera-regularized node-level constraint, plus theoretical guarantees on DP bounds, gradient estimation, and convergence. Experiments across tabular, vision, and language benchmarks show Aranyani achieves superior accuracy-fairness trade-offs and notable efficiency gains over baselines, highlighting its practical potential for online, fairness-aware decision systems.

Abstract

Fairness, especially group fairness, is an important consideration in the context of machine learning systems. The most commonly adopted group fairness-enhancing techniques are in-processing methods that rely on a mixture of a fairness objective (e.g., demographic parity) and a task-specific objective (e.g., cross-entropy) during the training process. However, when data arrives in an online fashion -- one instance at a time -- optimizing such fairness objectives poses several challenges. In particular, group fairness objectives are defined using expectations of predictions across different demographic groups. In the online setting, where the algorithm has access to a single instance at a time, estimating the group fairness objective requires additional storage and significantly more computation (e.g., forward/backward passes) than the task-specific objective at every time step. In this paper, we propose Aranyani, an ensemble of oblique decision trees, to make fair decisions in online settings. The hierarchical tree structure of Aranyani enables parameter isolation and allows us to efficiently compute the fairness gradients using aggregate statistics of previous decisions, eliminating the need for additional storage and forward/backward passes. We also present an efficient framework to train Aranyani and theoretically analyze several of its properties. We conduct empirical evaluations on 5 publicly available benchmarks (including vision and language datasets) to show that Aranyani achieves a better accuracy-fairness trade-off compared to baseline approaches.
Paper Structure (30 sections, 9 theorems, 46 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 9 theorems, 46 equations, 12 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Let $f(\mathbf{x})$ be a soft-routed oblique decision tree of height $h$ with $\|\theta_l\|=1$ and assume an equal number of input instances $\mathbf{x}$ for each group of a binary protected attribute $a \in \{0, 1\}$. Then, if all the node-level decisions satisfy the following condition: Then, the overall demographic parity of $f(\mathbf{x})$ is bounded as: $\mathrm{DP} \leq h2^{h}\epsilon$, for

Figures (12)

  • Figure 1: Schematic diagram of the functioning of Oblique Decision Forests. (Left): We illustrate the computation of a soft-routed oblique tree output $f^t(\mathbf{x})$ using individual tree node outputs. We observe that the final tree decision is composed of individual node outputs. (Right): We showcase how decisions from multiple oblique trees are combined to form $f(\mathbf{x})$.
  • Figure 2: We report the group fairness (demographic parity) vs. task performance (accuracy) trade-off plots for different systems in (left) Adult, (center) Census, and (right) COMPAS datasets. Ideally, a fair online system should achieve low demographic parity along with high accuracy scores. Considering the inverted $x$-axis, the performance of a fair system should lie in the top right quadrant of each plot. We report Aranyani's performance for different $\lambda$'s and observe that it achieves better accuracy-fairness trade-off compared to baseline systems.
  • Figure 3: We report the group fairness vs. accuracy trade-off plots for different systems in (left) CivilComments and (right) CelebA datasets. We observe that Aranyani achieves significantly better accuracy-fairness trade-off than baseline systems.
  • Figure 4: (left) We compare Aranyani with the reservoir variant that stores all input instances, (center) we investigate the gradient convergence, and (right) the impact of tree height on performance.
  • Figure 5: Ablations with different $\lambda$. We observe that increasing $\lambda$ results in lower accuracy and improved DP scores consistently throughout the online learning process.
  • ...and 7 more figures

Theorems & Definitions (19)

  • Definition 1: Oblique binary decision tree karthikeyan2022learning
  • Definition 2: Soft-Routed Oblique binary decision tree
  • Lemma 1: Demographic Parity Bound
  • Lemma 2: Rademacher Complexity
  • Lemma 3: Fairness Gradient Estimation Error
  • Theorem 1: Gradient Norm Convergence
  • Proposition 1
  • proof
  • proof : Proof of Lemma \ref{['lem:dp_bound']}
  • proof
  • ...and 9 more