Table of Contents
Fetching ...

ProbLog4Fairness: A Neurosymbolic Approach to Modeling and Mitigating Bias

Rik Adriaensen, Lucas Van Praet, Jessa Bekker, Robin Manhaeve, Pieter Delobelle, Maarten Buyl

TL;DR

The approach is to formalize bias assumptions as programs in ProbLog, a probabilistic logic programming language that allows for the description of probabilistic causal relationships through logic, and concludes that ProbLog4Fairness outperforms baselines due to its ability to flexibly model the relevant bias assumptions, where other methods typically uphold a fixed bias type or notion of fairness.

Abstract

Operationalizing definitions of fairness is difficult in practice, as multiple definitions can be incompatible while each being arguably desirable. Instead, it may be easier to directly describe algorithmic bias through ad-hoc assumptions specific to a particular real-world task, e.g., based on background information on systemic biases in its context. Such assumptions can, in turn, be used to mitigate this bias during training. Yet, a framework for incorporating such assumptions that is simultaneously principled, flexible, and interpretable is currently lacking. Our approach is to formalize bias assumptions as programs in ProbLog, a probabilistic logic programming language that allows for the description of probabilistic causal relationships through logic. Neurosymbolic extensions of ProbLog then allow for easy integration of these assumptions in a neural network's training process. We propose a set of templates to express different types of bias and show the versatility of our approach on synthetic tabular datasets with known biases. Using estimates of the bias distortions present, we also succeed in mitigating algorithmic bias in real-world tabular and image data. We conclude that ProbLog4Fairness outperforms baselines due to its ability to flexibly model the relevant bias assumptions, where other methods typically uphold a fixed bias type or notion of fairness.

ProbLog4Fairness: A Neurosymbolic Approach to Modeling and Mitigating Bias

TL;DR

The approach is to formalize bias assumptions as programs in ProbLog, a probabilistic logic programming language that allows for the description of probabilistic causal relationships through logic, and concludes that ProbLog4Fairness outperforms baselines due to its ability to flexibly model the relevant bias assumptions, where other methods typically uphold a fixed bias type or notion of fairness.

Abstract

Operationalizing definitions of fairness is difficult in practice, as multiple definitions can be incompatible while each being arguably desirable. Instead, it may be easier to directly describe algorithmic bias through ad-hoc assumptions specific to a particular real-world task, e.g., based on background information on systemic biases in its context. Such assumptions can, in turn, be used to mitigate this bias during training. Yet, a framework for incorporating such assumptions that is simultaneously principled, flexible, and interpretable is currently lacking. Our approach is to formalize bias assumptions as programs in ProbLog, a probabilistic logic programming language that allows for the description of probabilistic causal relationships through logic. Neurosymbolic extensions of ProbLog then allow for easy integration of these assumptions in a neural network's training process. We propose a set of templates to express different types of bias and show the versatility of our approach on synthetic tabular datasets with known biases. Using estimates of the bias distortions present, we also succeed in mitigating algorithmic bias in real-world tabular and image data. We conclude that ProbLog4Fairness outperforms baselines due to its ability to flexibly model the relevant bias assumptions, where other methods typically uphold a fixed bias type or notion of fairness.

Paper Structure

This paper contains 38 sections, 8 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The Bayesian networks for label, measurement, and historical bias. Dashed nodes are unobserved.
  • Figure 2: ProbLog4Fairness successfully models different types of bias, approaching the upper baseline. We measure accuracy and statistical disparity for an increasing probability of label (left), measurement (middle), and historical (right) bias during training while evaluating on unbiased data at test time.
  • Figure 3: Our method is able to remove only the problematic bias when $A \not\perp Y$, approaching the upper baseline. We measure accuracy and statistical disparity on unbiased data while training with an increasing probability of label bias under $A \not\perp Y$. Color coding is consistent with the legend in Figure \ref{['fig:bod-bias-sweep']}
  • Figure 4: Our method achieves the highest accuracy and the expected statistical disparity when the correct bias probability is used to estimate the parameters in the program. We train on fixed bias probabilities of 0.3, with $A \not \perp Y$ and evaluate on unbiased data, but vary the bias probability $\hat{\beta}$ used to set the program's parameters. The dashed lines indicate the upper baseline.
  • Figure 5: Our approach achieves a higher F1 score on the unbiased labels than the mitigating baselines and approaches the expected statistical disparity. Sensible simplifying assumptions seem to benefit performance. The gray vertical line indicates the statistical disparity in the unbiased labels. The ellipses show a 95% confidence region based on the standard error.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Example 1
  • Example 2
  • Example 3
  • Example 4