Table of Contents
Fetching ...

Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models

Hong Yang, Devroop Kar, Qi Yu, Travis Desell, Alex Ororbia

Abstract

Out-of-distribution (OOD) detection methods perform well on multi-domain benchmarks, yet many practical systems are trained on single-domain data. We show that this regime induces a geometric failure mode, Domain-Sensitivity Collapse (DSC): supervised training compresses features into a low-rank class subspace and suppresses directions that carry domain-shift signal. We provide theory showing that, under DSC, distance- and logit-based OOD scores lose sensitivity to domain shift. We then introduce Teacher-Guided Training (TGT), which distills class-suppressed residual structure from a frozen multi-domain teacher (DINOv2) into the student during training. The teacher and auxiliary head are discarded after training, adding no inference overhead. Across eight single-domain benchmarks, TGT yields large far-OOD FPR@95 reductions for distance-based scorers: MDS improves by 11.61 pp, ViM by 10.78 pp, and kNN by 12.87 pp (ResNet-50 average), while maintaining or slightly improving in-domain OOD and classification accuracy.

Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models

Abstract

Out-of-distribution (OOD) detection methods perform well on multi-domain benchmarks, yet many practical systems are trained on single-domain data. We show that this regime induces a geometric failure mode, Domain-Sensitivity Collapse (DSC): supervised training compresses features into a low-rank class subspace and suppresses directions that carry domain-shift signal. We provide theory showing that, under DSC, distance- and logit-based OOD scores lose sensitivity to domain shift. We then introduce Teacher-Guided Training (TGT), which distills class-suppressed residual structure from a frozen multi-domain teacher (DINOv2) into the student during training. The teacher and auxiliary head are discarded after training, adding no inference overhead. Across eight single-domain benchmarks, TGT yields large far-OOD FPR@95 reductions for distance-based scorers: MDS improves by 11.61 pp, ViM by 10.78 pp, and kNN by 12.87 pp (ResNet-50 average), while maintaining or slightly improving in-domain OOD and classification accuracy.
Paper Structure (73 sections, 4 theorems, 10 equations, 3 figures, 50 tables)

This paper contains 73 sections, 4 theorems, 10 equations, 3 figures, 50 tables.

Key Result

theorem 1

Let $\lambda_1 \ge \cdots \ge \lambda_d$ be the eigenvalues of $\mathrm{Cov}(z_{\mathrm{ID}})$ with eigenvectors $v_1,\ldots,v_d$. Suppose the ID-vs-OOD separation concentrates in a set of directions $\{v_j : j \in \mathcal{J}\}$ with $\lambda_j / \lambda_1 \le \rho$ for all $j \in \mathcal{J}$ and where $L_k \le 1$ is the Lipschitz constant of the $k$-NN distance statistic with respect to featur

Figures (3)

  • Figure 1: Per-dataset gains from Teacher-Guided Training on ResNet-50 relative to the CE baseline. Blue bars show effective-rank increase (TGT$-$CE), and hatched orange bars show FPR@95 reduction (CE$-$TGT; larger is better). The shown datasets are EuroSAT, Colon, Fashion, Tissue, and Rock (Rock shown in a separate $y$-region).
  • Figure 2: EuroSAT out-of-domain OOD (FarOOD) FPR@95 by method across teacher-guidance strengths $\lambda$, averaged over 5 random splits. The effect of $\lambda$ is method-dependent: for example, ReAct improves at higher $\lambda$, while SCALE worsens. Many methods also perform poorly at $\lambda=0.5$.
  • Figure S1: EuroSAT out-of-domain (FarOOD) FPR@95 by method across teacher-guidance strengths $\lambda$, averaged over 5 splits. The effect of $\lambda$ is method-dependent: for example, React improves at higher $\lambda$, while SCALE worsens. Many methods also perform poorly at $\lambda=0.5$.

Theorems & Definitions (7)

  • theorem 1: Distance failure under variance--discriminability mismatch
  • lemma 1
  • proposition 1: MSP/Energy insensitivity
  • remark 1: Interpretation and tightness
  • theorem 2: Distance failure
  • proof
  • proof