Table of Contents
Fetching ...

Local Statistical Parity for the Estimation of Fair Decision Trees

Andrea Quintanilla, Johan Van Horebeek

TL;DR

This work introduces Local Statistical Parity, a node-level fairness criterion that implies global Statistical Parity for decision trees, and demonstrates how to enforce it within recursive tree estimation. It develops the Constrained Logistic Regression Tree (C-LRT), a CART-like algorithm that selects node splits via Constrained Logistic Regression with a covariance constraint $|\widehat{Cov}(sd_{\theta}(X), A, D)| \le c$, enabling a tunable trade-off between accuracy and fairness. A key theoretical result shows that sufficient local independence of node tests from the protected attribute leads to global fairness, while empirical results on four AF datasets illustrate controllable fairness improvements with a clear accuracy-Fairness trade-off and some degenerate cases under strong constraints. The practical contribution is a convex, interpretable framework for fair decision trees that integrates fairness into the estimation process without abandoning the recursive tree structure.

Abstract

Given the high computational complexity of decision tree estimation, classical methods construct a tree by adding one node at a time in a recursive way. To facilitate promoting fairness, we propose a fairness criterion local to the tree nodes. We prove how it is related to the Statistical Parity criterion, popular in the Algorithmic Fairness literature, and show how to incorporate it into standard recursive tree estimation algorithms. We present a tree estimation algorithm called Constrained Logistic Regression Tree (C-LRT), which is a modification of the standard CART algorithm using locally linear classifiers and imposing restrictions as done in Constrained Logistic Regression. Finally, we evaluate the performance of trees estimated with C-LRT on datasets commonly used in the Algorithmic Fairness literature, using various classification and fairness metrics. The results confirm that C-LRT successfully allows to control and balance accuracy and fairness.

Local Statistical Parity for the Estimation of Fair Decision Trees

TL;DR

This work introduces Local Statistical Parity, a node-level fairness criterion that implies global Statistical Parity for decision trees, and demonstrates how to enforce it within recursive tree estimation. It develops the Constrained Logistic Regression Tree (C-LRT), a CART-like algorithm that selects node splits via Constrained Logistic Regression with a covariance constraint , enabling a tunable trade-off between accuracy and fairness. A key theoretical result shows that sufficient local independence of node tests from the protected attribute leads to global fairness, while empirical results on four AF datasets illustrate controllable fairness improvements with a clear accuracy-Fairness trade-off and some degenerate cases under strong constraints. The practical contribution is a convex, interpretable framework for fair decision trees that integrates fairness into the estimation process without abandoning the recursive tree structure.

Abstract

Given the high computational complexity of decision tree estimation, classical methods construct a tree by adding one node at a time in a recursive way. To facilitate promoting fairness, we propose a fairness criterion local to the tree nodes. We prove how it is related to the Statistical Parity criterion, popular in the Algorithmic Fairness literature, and show how to incorporate it into standard recursive tree estimation algorithms. We present a tree estimation algorithm called Constrained Logistic Regression Tree (C-LRT), which is a modification of the standard CART algorithm using locally linear classifiers and imposing restrictions as done in Constrained Logistic Regression. Finally, we evaluate the performance of trees estimated with C-LRT on datasets commonly used in the Algorithmic Fairness literature, using various classification and fairness metrics. The results confirm that C-LRT successfully allows to control and balance accuracy and fairness.

Paper Structure

This paper contains 11 sections, 3 theorems, 25 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Lemma 4

Given a tree $T$, if for every terminal node $t$, $T_t$ satisfies the Local Statistical Parity criterion, i.e., $A \perp dom_{T_t}(X)$, then $A\perp T(X)$.

Figures (1)

  • Figure 1: Prediction metrics (left column) and fairness metrics (right column) for LRT and C-LRT. The x-axis is associated with the value of the parameter $c$ of C-LRT; for LRT, there is no $c$, represented as "inf". Each color corresponds to a metric, and the y-axis is the average of the metric over 30 experiments. The blue numbers on the x-axis indicate the count of constant decision trees associated with the value of $c$. Error bars represent the confidence interval of the average.

Theorems & Definitions (9)

  • Definition 1: Statistical Parity
  • Definition 2
  • Definition 3: Local Statistical Parity
  • Lemma 4
  • Definition 5
  • Lemma 6
  • Theorem 7
  • proof
  • proof