Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density

Peiyu Yang; Naveed Akhtar; Mubarak Shah; Ajmal Mian

Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density

Peiyu Yang, Naveed Akhtar, Mubarak Shah, Ajmal Mian

TL;DR

The paper addresses the problem of model reliance on non-robust features by proposing a regularization that smooths the input marginal density. It derives a gradient-based penalty on ∇_x log p_{ heta}(x) via the log-partition function Z_{f(x)} and introduces a numerically stable, efficient implementation using softmax and Taylor-based approximations. Through extensive experiments on BlockMNIST, CelebA-Hair, and Waterbirds, the authors show reduced feature leakage, improved worst-group performance, and robustness to pixel, gradient, and density perturbations, outperforming several baselines. The work also demonstrates improvements in OOD detection and interpretability while acknowledging limitations such as potential trade-offs with strong regularization and the continuing relevance of adversarial training in some settings. Overall, the approach provides a principled way to regulate non-robust feature reliance with practical gains across diverse robustness benchmarks.

Abstract

Trustworthy machine learning necessitates meticulous regulation of model reliance on non-robust features. We propose a framework to delineate and regulate such features by attributing model predictions to the input. Within our approach, robust feature attributions exhibit a certain consistency, while non-robust feature attributions are susceptible to fluctuations. This behavior allows identification of correlation between model reliance on non-robust features and smoothness of marginal density of the input samples. Hence, we uniquely regularize the gradients of the marginal density w.r.t. the input features for robustness. We also devise an efficient implementation of our regularization to address the potential numerical instability of the underlying optimization process. Moreover, we analytically reveal that, as opposed to our marginal density smoothing, the prevalent input gradient regularization smoothens conditional or joint density of the input, which can cause limited robustness. Our experiments validate the effectiveness of the proposed method, providing clear evidence of its capability to address the feature leakage problem and mitigate spurious correlations. Extensive results further establish that our technique enables the model to exhibit robustness against perturbations in pixel values, input gradients, and density.

Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density

TL;DR

Abstract

Paper Structure (25 sections, 17 equations, 15 figures, 7 tables)

This paper contains 25 sections, 17 equations, 15 figures, 7 tables.

Introduction
Related Work
Feature Robustness by Attributions
Smoothing Marginal Density of Input
Stable and Efficient Implementation for Regularization
Limited Robustness in Input Gradient Regularization
Experiments
Efficacy against Feature Leakage and Adversarial Attacks
Efficacy for Spurious Correlation
Efficacy against Pixels, Gradients and Density Perturbations
Conclusion
Proof
Norm and Implementation Comparison
Adversarial Robustness Comparison
Computational Overhead Analysis
...and 10 more sections

Figures (15)

Figure 1: Attribution maps Shrikumar2017Learning and insertion game scores Petsiuk2018RISE for samples from (a) BlockMNIST and (b) CelebA-Hair datasets. As compared to input gradient regularization, our regularization leads to lower feature leakage while also achieving higher AUC for the insertion game.
Figure 2: The comparison of numerical stability, training efficiency and training time across different ResNet-34 models.
Figure 3: BlockMNIST samples and feature leakage problem. (a) BlockMNIST randomly appends a null block at the top or bottom of MNIST samples. (b&c) Attribution maps are calculated by IG on the standard and adversarially trained models.
Figure 4: Performance comparison between our method and InputGrad regularization under varying regularization coefficient for (a) Feature leakage, (b)-(c) Adversarial Accuracy under $L_{\infty}$ and $L_{2}$ PGD-20 attacks, and (d) Accuracy.
Figure 5: Spurious correlation on ResNet-34 trained on CelebA-Hair. The model fails to classify the male celebrity with blond hair due to a spurious correlation learned between females and blond hair.
...and 10 more figures

Theorems & Definitions (5)

definition thmcounterdefinition
Remark 1.
Remark 2.
Remark 3.
proof : Proof of Equation 7

Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density

TL;DR

Abstract

Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density

Authors

TL;DR

Abstract

Table of Contents

Figures (15)

Theorems & Definitions (5)