Out-of-Distribution Learning with Human Feedback
Haoyue Bai, Xuefeng Du, Katie Rainey, Shibin Parameswaran, Yixuan Li
TL;DR
Out-of-Distribution learning with human feedback addresses robustness to both covariate and semantic shifts by leveraging unlabeled wild data with selective human labels. The authors propose a gradient-based sampling score to pick $k$ informative wild samples, labeled as covariate OOD or semantic OOD, to train a robust multi-class classifier $f_{\mathbf{w}}$ and an OOD detector $D_{\boldsymbol{\theta}}$ under objective $R_{\mathcal{S}^{\text{in}},\mathcal{S}^{\text{c}}_{\text{selected}}}(f_{\mathbf{w}}) + \alpha R_{\mathcal{S}^{\text{in}},\mathcal{S}^{\text{s}}_{\text{selected}}}(g_{\boldsymbol{\theta}})$. The framework is supported by a generalization bound based on gradient-based distribution discrepancy, linking labeling budget and gradient mismatch to OOD performance. Empirically, on CIFAR-10 and related OOD benchmarks, the method yields notable gains over SCONE, including $5.82$ percentage-point improvements in OOD accuracy on covariate CIFAR-10-C and a $32.24$-point reduction in FPR95 on Texture, while maintaining strong ID performance. These results demonstrate practical benefits of using a small amount of human feedback to effectively utilize wild unlabeled data for simultaneous OOD generalization and detection in realistic deployment settings.
Abstract
Out-of-distribution (OOD) learning often relies heavily on statistical approaches or predefined assumptions about OOD data distributions, hindering their efficacy in addressing multifaceted challenges of OOD generalization and OOD detection in real-world deployment environments. This paper presents a novel framework for OOD learning with human feedback, which can provide invaluable insights into the nature of OOD shifts and guide effective model adaptation. Our framework capitalizes on the freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts. To harness such data, our key idea is to selectively provide human feedback and label a small number of informative samples from the wild data distribution, which are then used to train a multi-class classifier and an OOD detector. By exploiting human feedback, we enhance the robustness and reliability of machine learning models, equipping them with the capability to handle OOD scenarios with greater precision. We provide theoretical insights on the generalization error bounds to justify our algorithm. Extensive experiments show the superiority of our method, outperforming the current state-of-the-art by a significant margin.
