Table of Contents
Fetching ...

Shortcut Invariance: Targeted Jacobian Regularization in Disentangled Latent Space

Shivam Pal, Sakshi Varshney, Piyush Rai

TL;DR

ERM models often rely on shortcuts that degrade OOD generalization. SITAR learns a robust function by identifying shortcut axes from a disentangled latent space via $v_j = |\operatorname{Corr}(\mu^{(j)}, \mathcal{Y})|$ and applying targeted anisotropic noise that perturbs $\bar{\bm{z}} = \bm{z} + \alpha \, (\bm{v} \odot \bm{e})$, combined with a consistency loss to flatten the classifier along these axes. A small-noise expansion shows this objective is equivalent to a unified Jacobian regularizer that penalizes the classifier’s sensitivity on shortcut directions, encouraging reliance on core semantic signals. Empirically, SITAR achieves state-of-the-art worst-group (OOD) accuracy on ColorMNIST, CelebA, Waterbirds, and Camelyon17-WILDS, while maintaining competitive in-distribution performance and avoiding brittle latent-space partitioning. The approach is simple, scalable, and broadly applicable, though it currently relies on a disentangled $\beta$-VAE; extending to pre-trained encoders could widen its applicability to real-world systems.

Abstract

Deep neural networks are prone to learning shortcuts, spurious and easily learned correlations in training data that cause severe failures in out-of-distribution (OOD) generalization. A dominant line of work seeks robustness by learning a robust representation, often explicitly partitioning the latent space into core and spurious components; this approach can be complex, brittle, and difficult to scale. We take a different approach, instead of a robust representation, we learn a robust function. We present a simple and effective training method that renders the classifier functionally invariant to shortcut signals. Our method operates within a disentangled latent space, which is essential as it isolates spurious and core features into distinct dimensions. This separation enables the identification of candidate shortcut features by their strong correlation with the label, used as a proxy for semantic simplicity. The classifier is then desensitized to these features by injecting targeted, anisotropic latent noise during training. We analyze this as targeted Jacobian regularization, which forces the classifier to ignore spurious features and rely on more complex, core semantic signals. The result is state-of-the-art OOD performance on established shortcut learning benchmarks.

Shortcut Invariance: Targeted Jacobian Regularization in Disentangled Latent Space

TL;DR

ERM models often rely on shortcuts that degrade OOD generalization. SITAR learns a robust function by identifying shortcut axes from a disentangled latent space via and applying targeted anisotropic noise that perturbs , combined with a consistency loss to flatten the classifier along these axes. A small-noise expansion shows this objective is equivalent to a unified Jacobian regularizer that penalizes the classifier’s sensitivity on shortcut directions, encouraging reliance on core semantic signals. Empirically, SITAR achieves state-of-the-art worst-group (OOD) accuracy on ColorMNIST, CelebA, Waterbirds, and Camelyon17-WILDS, while maintaining competitive in-distribution performance and avoiding brittle latent-space partitioning. The approach is simple, scalable, and broadly applicable, though it currently relies on a disentangled -VAE; extending to pre-trained encoders could widen its applicability to real-world systems.

Abstract

Deep neural networks are prone to learning shortcuts, spurious and easily learned correlations in training data that cause severe failures in out-of-distribution (OOD) generalization. A dominant line of work seeks robustness by learning a robust representation, often explicitly partitioning the latent space into core and spurious components; this approach can be complex, brittle, and difficult to scale. We take a different approach, instead of a robust representation, we learn a robust function. We present a simple and effective training method that renders the classifier functionally invariant to shortcut signals. Our method operates within a disentangled latent space, which is essential as it isolates spurious and core features into distinct dimensions. This separation enables the identification of candidate shortcut features by their strong correlation with the label, used as a proxy for semantic simplicity. The classifier is then desensitized to these features by injecting targeted, anisotropic latent noise during training. We analyze this as targeted Jacobian regularization, which forces the classifier to ignore spurious features and rely on more complex, core semantic signals. The result is state-of-the-art OOD performance on established shortcut learning benchmarks.

Paper Structure

This paper contains 36 sections, 1 theorem, 17 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Assume $f_\theta$ is twice continuously differentiable in a neighborhood of $\bm{z}$. Let $H_\ell \!=\! \nabla^2\!\ell_{\text{CE}}(f,y)$, which is positive semidefinite. Then for sufficiently small $\alpha$,

Figures (8)

  • Figure 1: Overview of SITAR. A $\beta$-VAE encoder $\mathcal{E}_\phi$ maps input images $\mathcal{X}$ to Gaussian latents $\bm{z} \sim \mathcal{N}(\bm{\mu},\bm{\sigma})$, which are then passed to a decoder $\mathcal{D}_\psi$ for reconstruction and to a classifier $f_\theta$ for prediction. Using labels $\mathcal{Y}$ and latent means $\bm{\mu}$, SITAR computes per-dimension shortcut scores $v_j = \lvert \operatorname{corr}(\mu_j,\mathcal{Y}) \rvert$ (with gradients stopped), forming a weight vector $\bm{v}$. Independent Gaussian noise $\bm{\epsilon} \sim \mathcal{N}(0,\alpha I)$ is scaled elementwise by $\bm{v}$ and added to $\bm{z}$ to obtain perturbed latents $\bar{\bm{z}}$. The encoder $E_\phi$, decoder $D_\psi$, and classifier $f_\theta$ are trained jointly using the sum of four losses: reconstruction, $\beta$-weighted KL divergence, cross-entropy on $(\bar{\bm{z}},\mathcal{Y})$, and an $\ell_2$ consistency loss $\lVert f_\theta(\bm{z}) - f_\theta(\bar{\bm{z}})\rVert_2^2$, encouraging shortcut-invariant decision functions in the disentangled latent space.
  • Figure 2: Shortcut proxy $\bm v$ on ColorMNIST (target: digit, shortcut: color). Top-left shows the original image. Each row displays a latent traversal obtained by varying a single latent dimension while keeping all others fixed. The last column shows the bar plot of the absolute correlation between each latent coordinate $\bm z_i$ and the shortcut label. Latent dimension $\bm z_5$ has the highest correlation, and traversing along this dimension changes the digit color while largely preserving its shape, confirming that $\bm z_5$ is the shortcut dimension.
  • Figure 3: Shortcut proxy $\bm{v}$ on CelebA (Target: blond hair, shortcut: gender). From left to right, the first column shows the original images, the next columns show latent traversals obtained by varying a single latent dimension while keeping all others fixed, and the last column shows the absolute correlation values $|\mathrm{corr}(\bm \mu^{(j)}, y)|$ for each latent dimension. Latent dimension $z_8$ has a significantly higher correlation than the others, and traversing along this dimension primarily changes the shortcut attribute (apparent gender) while keeping the target attribute almost fixed: images on the left appear more male-like, while images on the right appear more female-like.
  • Figure 4: Ablation on the disentanglement factor $\beta$ (fixed $\alpha=1.0$). OOD accuracy versus $\beta$. Low $\beta$ leaves latents entangled and reduces OOD to ERM-like $\sim10\%$. With $\beta\!\ge\!1$, the shortcut axis is isolated and OOD rises toward the oracle.
  • Figure 5: Noise magnitude controls invariance (fixed $\beta=2$). OOD accuracy versus $\alpha$. Any $\alpha\!>\!0$ improves OOD from ERM's $\sim10\%$ toward $\sim65$–$70\%$, consistent with the targeted Jacobian penalty.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Unified Jacobian Regularizer
  • proof