Table of Contents
Fetching ...

Fairness without Sensitive Attributes via Knowledge Sharing

Hongliang Ni, Lei Han, Tong Chen, Shazia Sadiq, Gianluca Demartini

TL;DR

A confidence-based hierarchical classifier structure called “Reckoner” is proposed for reliable fair model learning under the assumption of missing sensitive attributes for reliable fair model learning under the assumption of missing sensitive attributes.

Abstract

While model fairness improvement has been explored previously, existing methods invariably rely on adjusting explicit sensitive attribute values in order to improve model fairness in downstream tasks. However, we observe a trend in which sensitive demographic information becomes inaccessible as public concerns around data privacy grow. In this paper, we propose a confidence-based hierarchical classifier structure called "Reckoner" for reliable fair model learning under the assumption of missing sensitive attributes. We first present results showing that if the dataset contains biased labels or other hidden biases, classifiers significantly increase the bias gap across different demographic groups in the subset with higher prediction confidence. Inspired by these findings, we devised a dual-model system in which a version of the model initialised with a high-confidence data subset learns from a version of the model initialised with a low-confidence data subset, enabling it to avoid biased predictions. Our experimental results show that Reckoner consistently outperforms state-of-the-art baselines in COMPAS dataset and New Adult dataset, considering both accuracy and fairness metrics.

Fairness without Sensitive Attributes via Knowledge Sharing

TL;DR

A confidence-based hierarchical classifier structure called “Reckoner” is proposed for reliable fair model learning under the assumption of missing sensitive attributes for reliable fair model learning under the assumption of missing sensitive attributes.

Abstract

While model fairness improvement has been explored previously, existing methods invariably rely on adjusting explicit sensitive attribute values in order to improve model fairness in downstream tasks. However, we observe a trend in which sensitive demographic information becomes inaccessible as public concerns around data privacy grow. In this paper, we propose a confidence-based hierarchical classifier structure called "Reckoner" for reliable fair model learning under the assumption of missing sensitive attributes. We first present results showing that if the dataset contains biased labels or other hidden biases, classifiers significantly increase the bias gap across different demographic groups in the subset with higher prediction confidence. Inspired by these findings, we devised a dual-model system in which a version of the model initialised with a high-confidence data subset learns from a version of the model initialised with a low-confidence data subset, enabling it to avoid biased predictions. Our experimental results show that Reckoner consistently outperforms state-of-the-art baselines in COMPAS dataset and New Adult dataset, considering both accuracy and fairness metrics.
Paper Structure (14 sections, 5 equations, 4 figures, 3 tables)

This paper contains 14 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Observed measure factor gaps derived from the confusion matrix of the trained logistic regression classifier. (a) and (c) True Negative Rate (TNR) and True Positive Rate (TPR) gaps for two demographic groups across different confidence levels in COMPAS dataset and New Adult dataset, respectively. (b) and (d) False Negative Rate (FNR) and False Positive Rate (FPR) gaps for two demographic groups across different confidence levels in the COMPAS dataset and the New Adult dataset, respectively.
  • Figure 2: (a) An example of the COMPAS dataset. In our experiment, the attribute ’Race’ in the red box is not used. (b) An example of the New Adult dataset. In our experiment, the attribute ’Race’ in the red box is not used.
  • Figure 3: (a) and (b) Distributions of the attribute ’Age’ and 'Previous Misconduct' across different subsets of the testing set.
  • Figure 4: Overview of Reckoner. Reckoner consists of two stages. Identification stage: we first train a logistic regression classifier on the raw data, and then split the data based on confidence scores. In Refinement stage, we introduce learnable noise into the original dataset. We employ two classifiers, one for low-confidence instances and another for high-confidence ones. The Low-Conf classifier uses pseudo-labels produced by the High-Conf classifier for limited training times and restores for each new data. Knowledge acquired during this process is then shared with the High-Conf classifier, which incorporates ground truth data to refine its model weights.