Table of Contents
Fetching ...

Inference-Time Rule Eraser: Fair Recognition via Distilling and Removing Biased Rules

Yi Zhang, Dongyuan Lu, Jitao Sang

TL;DR

This work tackles bias in deployed AI systems where access to model parameters is restricted. By framing debiasing in a Bayesian context, it introduces the Inference-Time Rule Eraser (Eraser), which removes biased decision rules from outputs without retraining, via a two-stage Distill→Remove process that uses a patch model to capture and erase $p(y|b)$. The method achieves notable fairness gains (lower Equalodds) while preserving or improving accuracy across vision and structured datasets, including multi-bias scenarios, and demonstrates robustness to calibration data size and backbone architecture. Practically, Eraser offers a scalable, data-efficient solution for debiasing black-box deployed models with minimal operational overhead. The approach has broad implications for fair AI deployment, especially in high-stakes settings, by enabling post-hoc rule editing purely at inference time.

Abstract

Machine learning models often make predictions based on biased features such as gender, race, and other social attributes, posing significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives. However, these methods can be impractical due to significant computational resources, complex industrial tests, and the associated CO2 footprint. Additionally, regular users often fail to fine-tune models because they lack access to model parameters In this paper, we introduce the Inference-Time Rule Eraser (Eraser), a novel method designed to address fairness concerns by removing biased decision-making rules from deployed models during inference without altering model weights. We begin by establishing a theoretical foundation for modifying model outputs to eliminate biased rules through Bayesian analysis. Next, we present a specific implementation of Eraser that involves two stages: (1) distilling the biased rules from the deployed model into an additional patch model, and (2) removing these biased rules from the output of the deployed model during inference. Extensive experiments validate the effectiveness of our approach, showcasing its superior performance in addressing fairness concerns in AI systems.

Inference-Time Rule Eraser: Fair Recognition via Distilling and Removing Biased Rules

TL;DR

This work tackles bias in deployed AI systems where access to model parameters is restricted. By framing debiasing in a Bayesian context, it introduces the Inference-Time Rule Eraser (Eraser), which removes biased decision rules from outputs without retraining, via a two-stage Distill→Remove process that uses a patch model to capture and erase . The method achieves notable fairness gains (lower Equalodds) while preserving or improving accuracy across vision and structured datasets, including multi-bias scenarios, and demonstrates robustness to calibration data size and backbone architecture. Practically, Eraser offers a scalable, data-efficient solution for debiasing black-box deployed models with minimal operational overhead. The approach has broad implications for fair AI deployment, especially in high-stakes settings, by enabling post-hoc rule editing purely at inference time.

Abstract

Machine learning models often make predictions based on biased features such as gender, race, and other social attributes, posing significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives. However, these methods can be impractical due to significant computational resources, complex industrial tests, and the associated CO2 footprint. Additionally, regular users often fail to fine-tune models because they lack access to model parameters In this paper, we introduce the Inference-Time Rule Eraser (Eraser), a novel method designed to address fairness concerns by removing biased decision-making rules from deployed models during inference without altering model weights. We begin by establishing a theoretical foundation for modifying model outputs to eliminate biased rules through Bayesian analysis. Next, we present a specific implementation of Eraser that involves two stages: (1) distilling the biased rules from the deployed model into an additional patch model, and (2) removing these biased rules from the output of the deployed model during inference. Extensive experiments validate the effectiveness of our approach, showcasing its superior performance in addressing fairness concerns in AI systems.
Paper Structure (19 sections, 1 theorem, 21 equations, 10 figures, 5 tables)

This paper contains 19 sections, 1 theorem, 21 equations, 10 figures, 5 tables.

Key Result

Theorem 1

(Inference-Time Rule Eraser) Assume $\hat{\phi}$ to be the conditional probability of the fair models that without biased rule, with the form $\hat{\phi}_j=\hat{p}(y=j|\mathbf{x})=\frac{p(\mathbf{x}|y=j,b)}{p(\mathbf{x}|b)} \frac{1}{k}$, and $\phi$ to be the conditional probability of the biased (de

Figures (10)

  • Figure 1: Illustration of the proposed Eraser. In the Distill stage, biased rules are distilled and imparted to the patch model. In the Remove stage, the bias response extracted by the patch model is removed from the model output.
  • Figure 2: The architecture of our proposed Inference-Time Rule Eraser (Eraser). During preparation, the Eraser uses a causality-based distillation strategy to distill biased rules from the deployed black-box model and stores them in the additional patch model. During inference, the Eraser subtracts the patch model's output from the black-box model's original output in log space to produce a fair prediction.
  • Figure 3: The causal graph of the inference process of the biased model. (a) The output $Y$ of the biased model is directly affected by the target feature $X^Y$ and the bias feature $X^B$ in the input $X$. (b) With the conditioning $x^y = \emptyset$, the output $Y$ is only affected by the bias feature $X^B$.
  • Figure 4: Rule distillation via simulating sample-editing. (a) Using a single sample as the contrastive sample of sample $\mathbf{x}$. (b) Employing multiple samples simultaneously as contrastive samples of sample $\mathbf{x}$.
  • Figure 5: Patterns exhibited by the majority (95$\%$) of samples across ten superclasses in the proposed ImageNet-B(ias) dataset.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Theorem 1