Fairness-Aware Streaming Feature Selection with Causal Graphs

Leizhen Zhang; Lusi Li; Di Wu; Sheng Chen; Yi He

Fairness-Aware Streaming Feature Selection with Causal Graphs

Leizhen Zhang, Lusi Li, Di Wu, Sheng Chen, Yi He

TL;DR

The paper tackles fairness in streaming feature selection by modeling non-associational bias through two egocentric causal graphs centered on a protected attribute $S$ and the label $Y$. It introduces Streaming Feature Selection with Causal Fairness (SFCF), which incrementally constructs Markov blankets to identify strong, redundant, and irrelevant features and replaces inadmissible features with admissible ones (AD1/AD2) to balance empirical risk and sparsity under equalized odds constraints. The method jointly optimizes accuracy and fairness in an online setting, achieving superior fairness (lower EO) and sparsity with competitive accuracy across five benchmarks, while significantly reducing the number of selected features. The approach demonstrates scalable, real-time bias mitigation in streaming contexts and provides public code to support reproducible research. Overall, SFCF advances fair and efficient online feature selection by leveraging dual causal structures and conditional independence to adaptively manage bias in evolving feature streams.

Abstract

Its crux lies in the optimization of a tradeoff between accuracy and fairness of resultant models on the selected feature subset. The technical challenge of our setting is twofold: 1) streaming feature inputs, such that an informative feature may become obsolete or redundant for prediction if its information has been covered by other similar features that arrived prior to it, and 2) non-associational feature correlation, such that bias may be leaked from those seemingly admissible, non-protected features. To overcome this, we propose Streaming Feature Selection with Causal Fairness (SFCF) that builds two causal graphs egocentric to prediction label and protected feature, respectively, striving to model the complex correlation structure among streaming features, labels, and protected information. As such, bias can be eradicated from predictive modeling by removing those features being causally correlated with the protected feature yet independent to the labels. We theorize that the originally redundant features for prediction can later become admissible, when the learning accuracy is compromised by the large number of removed features (non-protected but can be used to reconstruct bias information). We benchmark SFCF\ on five datasets widely used in streaming feature research, and the results substantiate its performance superiority over six rival models in terms of efficiency and sparsity of feature selection and equalized odds of the resultant predictive models.

Fairness-Aware Streaming Feature Selection with Causal Graphs

TL;DR

The paper tackles fairness in streaming feature selection by modeling non-associational bias through two egocentric causal graphs centered on a protected attribute

and the label

. It introduces Streaming Feature Selection with Causal Fairness (SFCF), which incrementally constructs Markov blankets to identify strong, redundant, and irrelevant features and replaces inadmissible features with admissible ones (AD1/AD2) to balance empirical risk and sparsity under equalized odds constraints. The method jointly optimizes accuracy and fairness in an online setting, achieving superior fairness (lower EO) and sparsity with competitive accuracy across five benchmarks, while significantly reducing the number of selected features. The approach demonstrates scalable, real-time bias mitigation in streaming contexts and provides public code to support reproducible research. Overall, SFCF advances fair and efficient online feature selection by leveraging dual causal structures and conditional independence to adaptively manage bias in evolving feature streams.

Abstract

Paper Structure (17 sections, 10 equations, 2 figures, 3 tables)

This paper contains 17 sections, 10 equations, 2 figures, 3 tables.

Introduction
Related Work
Fairness-Aware Machine Learning
Online Streaming Feature Selection
Preliminaries
Problem Statement
Causal Graph
Bayesian Causal Relationship
Technical Challenges and Our Thoughts
The SFCF Approach
Causal Graph Construction with Markov Blanket
Optimizing the Accuracy-Fairness Tradeoff
Time Complexity Analysis
Experiments
Experiment Setup
...and 2 more sections

Figures (2)

Figure 1: Causal graphs $G_S$ and $G_Y$. 1) Each small circle represents a feature or label with its name. 2) Varying shades of red areas represent $S$, $MB(S)$, $Redundant(S)$; different shades of yellow areas represent $Y$, $MB(Y)$, $Redundant(Y)$; white area represents $Irrelevant(S)$ and $Irrelevant(Y)$. 3) The dashed line represents the potential relationship.
Figure 2: Experimental results, including ACC and EO, across 5 datasets using LR classifier. 1) The higher the ACC indicator, the better the model performs. Conversely, the smaller the EO indicator, the fairer the model is considered. Taking these two aspects into account, a model that is closer to the bottom right corner demonstrates better overall performance. 2) Our model, represented in blue, includes SFCF-RI, SFCF-AD1, and SFCF-AD2. 3) The numerical text surrounding the legend represents the average proportion of selected features.

Theorems & Definitions (7)

Definition 1: Null-Conditional Independence
Definition 2: Conditional Independence koller1996toward
Definition 3: Strong Relevance kohavi1997wrappers
Definition 4: Redundance yu2004efficient
Definition 5: Irrelevance wu2010online
Definition 6: D-Separated
Definition 7: Inadmissible Feature

Fairness-Aware Streaming Feature Selection with Causal Graphs

TL;DR

Abstract

Fairness-Aware Streaming Feature Selection with Causal Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (7)