Quantitative Verification of Fairness in Tree Ensembles
Zhenjiang Zhao, Takahisa Toda, Takashi Kitamura
TL;DR
This work tackles the challenge of quantitatively verifying fairness and robustness in tree ensembles by introducing BoxQTE, a SMT-based framework that exploits the box-structured input partitioning inherent to trees to compute any-time upper and lower bounds on fairness measures. It formalizes confidence-based fairness and robustness (epsilon-kappa) for tree ensembles, defines path-tuple boxes, and shows how to encode the problem to discover counterexample regions without exhaustive search. BoxQTE delivers superior efficiency and accuracy compared to CegarQuant, aided by enhancements like task priority, decomposition, and box blocking, and it demonstrates notable improvements in fairness testing by identifying many more discriminatory instances in substantially less time. The approach holds promise for model debugging and bias mitigation in tabular-data applications, with extensions to multi-class ensembles and integration with surrogate-based testing envisaged for future work.
Abstract
This work focuses on quantitative verification of fairness in tree ensembles. Unlike traditional verification approaches that merely return a single counterexample when the fairness is violated, quantitative verification estimates the ratio of all counterexamples and characterizes the regions where they occur, which is important information for diagnosing and mitigating bias. To date, quantitative verification has been explored almost exclusively for deep neural networks (DNNs). Representative methods, such as DeepGemini and FairQuant, all build on the core idea of Counterexample-Guided Abstraction Refinement, a generic framework that could be adapted to other model classes. We extended the framework into a model-agnostic form, but discovered two limitations: (i) it can provide only lower bounds, and (ii) its performance scales poorly. Exploiting the discrete structure of tree ensembles, our work proposes an efficient quantification technique that delivers any-time upper and lower bounds. Experiments on five widely used datasets demonstrate its effectiveness and efficiency. When applied to fairness testing, our quantification method significantly outperforms state-of-the-art testing techniques.
