A Proof System with Causal Labels (Part II): checking Counterfactual Fairness
Leonardo Ceragioli, Giuseppe Primiero
TL;DR
This paper addresses verifying counterfactual fairness for probabilistic classifiers by embedding causal interventions into the TNDPQ proof system. It formalizes data points and causal relations, introducing predicates for immediate and mediate causes and an intervention operator I(a:alpha), and extends sequents to hypersequents that express factual and counterfactual judgments. A set of rules, including C-Weakening and various v- and I-Cut variants, allows deriving counterfactual sequents and deciding CF by comparing the resulting probabilities p and q. The work demonstrates how to check whether a classifier's output remains invariant under interventions on protected attributes within the atomic fragment, providing a principled basis for assessing fairness in opaque ML systems. It offers a formal framework that can inform verification tools and audits of probabilistic classifiers against counterfactual fairness criteria.
Abstract
In this article we propose an extension to the typed natural deduction calculus TNDPQ to model verification of counterfactual fairness in probabilistic classifiers. This is obtained formulating specific structural conditions for causal labels and checking that evaluation is robust under their variation.
