Re-evaluating Group Robustness via Adaptive Class-Specific Scaling
Seonguk Seo, Bohyung Han
TL;DR
This work tackles the persistent trade-off between robust (group-wise) and average accuracies in group robustness methods. It introduces a training-free class-specific scaling as a post-processing step to control this trade-off, and extends it with instance-wise scaling that leverages feature clusters for per-example adjustments. A novel robust coverage metric is proposed to quantify the trade-off along the Pareto frontier, enabling a unified evaluation across methods. Empirical results across computer vision and NLP benchmarks show that simple RS/IRS can match or outperform several debiasing approaches with negligible training overhead, highlighting the potential of post-processing avenues for robust fairness. The framework provides practical guidance for selecting desirable performance points and offers insight into the behavior of existing debiasing techniques beyond robust accuracy alone.
Abstract
Group distributionally robust optimization, which aims to improve robust accuracies -- worst-group and unbiased accuracies -- is a prominent algorithm used to mitigate spurious correlations and address dataset bias. Although existing approaches have reported improvements in robust accuracies, these gains often come at the cost of average accuracy due to inherent trade-offs. To control this trade-off flexibly and efficiently, we propose a simple class-specific scaling strategy, directly applicable to existing debiasing algorithms with no additional training. We further develop an instance-wise adaptive scaling technique to alleviate this trade-off, even leading to improvements in both robust and average accuracies. Our approach reveals that a naïve ERM baseline matches or even outperforms the recent debiasing methods by simply adopting the class-specific scaling technique. Additionally, we introduce a novel unified metric that quantifies the trade-off between the two accuracies as a scalar value, allowing for a comprehensive evaluation of existing algorithms. By tackling the inherent trade-off and offering a performance landscape, our approach provides valuable insights into robust techniques beyond just robust accuracy. We validate the effectiveness of our framework through experiments across datasets in computer vision and natural language processing domains.
