Imbalanced Classification through the Lens of Spurious Correlations
Jakob Hackstein, Sidney Bender
TL;DR
The paper tackles imbalanced classification by reframing imbalance as a condition that fosters spurious correlations. It introduces Counterfactual Knowledge Distillation (CFKD), a two-stage method that detects reliance on spurious cues via counterfactual explanations and then eliminates them by fine-tuning on counterfactual data annotated by a domain expert. The approach yields competitive or superior performance across multiple datasets and provides a mechanism to expose and mitigate Clever Hans effects under imbalance. This work advances reliable, causally-informed classification in safety-critical contexts where minority information is underspecified.
Abstract
Class imbalance poses a fundamental challenge in machine learning, frequently leading to unreliable classification performance. While prior methods focus on data- or loss-reweighting schemes, we view imbalance as a data condition that amplifies Clever Hans (CH) effects by underspecification of minority classes. In a counterfactual explanations-based approach, we propose to leverage Explainable AI to jointly identify and eliminate CH effects emerging under imbalance. Our method achieves competitive classification performance on three datasets and demonstrates how CH effects emerge under imbalance, a perspective largely overlooked by existing approaches.
