PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
Tianhao Zhang, Zhixiang Chen, Lyudmila S. Mihaylova
TL;DR
PViT introduces a Prior-augmented Vision Transformer that leverages priors from a pretrained model to improve OOD detection for Vision Transformers. By adding a prior token and formulating a Prior Guide Energy score, the model aligns predictions with priors on in-distribution data while increasing divergence for out-of-distribution samples. Extensive experiments on ImageNet-1K with seven OOD benchmarks show substantial gains in FPR95 and AUROC without synthetic outlier generation, and ablations validate the effectiveness of the prior-guidance term and prior token. This approach provides a scalable, data-efficient mechanism to imbue ViTs with a useful inductive bias for safety-critical vision tasks, and ports well to large vision models with minimal architectural changes.
Abstract
Vision Transformers (ViTs) have achieved remarkable success over various vision tasks, yet their robustness against data distribution shifts and inherent inductive biases remain underexplored. To enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection, we introduce a novel and generic framework named Prior-augmented Vision Transformer (PViT). Taking as input the prior class logits from a pretrained model, we train PViT to predict the class logits. During inference, PViT identifies OOD samples by quantifying the divergence between the predicted class logits and the prior logits obtained from pre-trained models. Unlike existing state-of-the-art(SOTA) OOD detection methods, PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guided confidence, without requiring additional data modeling, generation methods, or structural modifications. Extensive experiments on the large-scale ImageNet benchmark, evaluated against over seven OOD datasets, demonstrate that PViT significantly outperforms existing SOTA OOD detection methods in terms of FPR95 and AUROC. The codebase is publicly available at https://github.com/RanchoGoose/PViT.
