Fair Context Learning for Evidence-Balanced Test-Time Adaptation in Vision-Language Models
Sanggeon Yun, Ryozo Masukawa, SungHeon Jeong, Wenjun Huang, Hanning Chen, Mohsen Imani
TL;DR
This work tackles robustness gaps of vision–language models under distribution shifts by moving beyond entropy-based test-time adaptation. It introduces Fair Context Learning (FCL), a two-stage framework that first explores plausible class candidates via low-entropy augmented views and then calibrates text contexts to balance sensitivity to shared visual evidence using common-evidence maps. The calibration objective combines a Jensen–Shannon divergence term with a semantic-alignment regularizer, enabling non–entropy-based adaptation that mitigates partial feature obsession. Empirical results across natural shifts and fine-grained datasets demonstrate competitive gains with improved fairness, efficiency, and generalization, validated by extensive ablations and qualitative analyses.
Abstract
Vision-Language Models (VLMs) such as CLIP enable strong zero-shot recognition but suffer substantial degradation under distribution shifts. Test-Time Adaptation (TTA) aims to improve robustness using only unlabeled test samples, yet most prompt-based TTA methods rely on entropy minimization -- an approach that can amplify spurious correlations and induce overconfident errors when classes share visual features. We propose Fair Context Learning (FCL), an episodic TTA framework that avoids entropy minimization by explicitly addressing shared-evidence bias. Motivated by our additive evidence decomposition assumption, FCL decouples adaptation into (i) augmentation-based exploration to identify plausible class candidates, and (ii) fairness-driven calibration that adapts text contexts to equalize sensitivity to common visual evidence. This fairness constraint mitigates partial feature obsession and enables effective calibration of text embeddings without relying on entropy reduction. Through extensive evaluation, we empirically validate our theoretical motivation and show that FCL achieves competitive adaptation performance relative to state-of-the-art TTA methods across diverse domain-shift and fine-grained benchmarks.
