Out-of-Distribution Learning with Human Feedback

Haoyue Bai; Xuefeng Du; Katie Rainey; Shibin Parameswaran; Yixuan Li

Out-of-Distribution Learning with Human Feedback

Haoyue Bai, Xuefeng Du, Katie Rainey, Shibin Parameswaran, Yixuan Li

TL;DR

Out-of-Distribution learning with human feedback addresses robustness to both covariate and semantic shifts by leveraging unlabeled wild data with selective human labels. The authors propose a gradient-based sampling score to pick $k$ informative wild samples, labeled as covariate OOD or semantic OOD, to train a robust multi-class classifier $f_{\mathbf{w}}$ and an OOD detector $D_{\boldsymbol{\theta}}$ under objective $R_{\mathcal{S}^{\text{in}},\mathcal{S}^{\text{c}}_{\text{selected}}}(f_{\mathbf{w}}) + \alpha R_{\mathcal{S}^{\text{in}},\mathcal{S}^{\text{s}}_{\text{selected}}}(g_{\boldsymbol{\theta}})$. The framework is supported by a generalization bound based on gradient-based distribution discrepancy, linking labeling budget and gradient mismatch to OOD performance. Empirically, on CIFAR-10 and related OOD benchmarks, the method yields notable gains over SCONE, including $5.82$ percentage-point improvements in OOD accuracy on covariate CIFAR-10-C and a $32.24$-point reduction in FPR95 on Texture, while maintaining strong ID performance. These results demonstrate practical benefits of using a small amount of human feedback to effectively utilize wild unlabeled data for simultaneous OOD generalization and detection in realistic deployment settings.

Abstract

Out-of-distribution (OOD) learning often relies heavily on statistical approaches or predefined assumptions about OOD data distributions, hindering their efficacy in addressing multifaceted challenges of OOD generalization and OOD detection in real-world deployment environments. This paper presents a novel framework for OOD learning with human feedback, which can provide invaluable insights into the nature of OOD shifts and guide effective model adaptation. Our framework capitalizes on the freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts. To harness such data, our key idea is to selectively provide human feedback and label a small number of informative samples from the wild data distribution, which are then used to train a multi-class classifier and an OOD detector. By exploiting human feedback, we enhance the robustness and reliability of machine learning models, equipping them with the capability to handle OOD scenarios with greater precision. We provide theoretical insights on the generalization error bounds to justify our algorithm. Extensive experiments show the superiority of our method, outperforming the current state-of-the-art by a significant margin.

Out-of-Distribution Learning with Human Feedback

TL;DR

informative wild samples, labeled as covariate OOD or semantic OOD, to train a robust multi-class classifier

and an OOD detector

under objective

. The framework is supported by a generalization bound based on gradient-based distribution discrepancy, linking labeling budget and gradient mismatch to OOD performance. Empirically, on CIFAR-10 and related OOD benchmarks, the method yields notable gains over SCONE, including

percentage-point improvements in OOD accuracy on covariate CIFAR-10-C and a

-point reduction in FPR95 on Texture, while maintaining strong ID performance. These results demonstrate practical benefits of using a small amount of human feedback to effectively utilize wild unlabeled data for simultaneous OOD generalization and detection in realistic deployment settings.

Abstract

Paper Structure (48 sections, 8 theorems, 58 equations, 3 figures, 11 tables, 1 algorithm)

This paper contains 48 sections, 8 theorems, 58 equations, 3 figures, 11 tables, 1 algorithm.

Introduction
Problem Setup
Labeled in-distribution data.
Unlabeled wild data.
Learning goal.
Proposed Framework
Sample Selection for Human Feedback
Learning Objective Leveraging Human Feedback
Theoretical insights.
Practical implications.
Experiments
Settings
Datasets and benchmarks.
Experimental details.
Evaluation metrics.
...and 33 more sections

Key Result

Theorem 1

(Informal). Let $\mathcal{W}$ be a hypothesis space with a VC-dimension of $d$. Denote the datasets $\mathcal{S}^{\text{in}}$ and $\mathcal{S}_{\text{selected}}^{\text{c}}$ as the labeled ID and the selected covariate OOD data by active learning, and their sizes are $n$ and $m_{\text{c}}$, respectiv where $\zeta=\sqrt{(\frac{1}{n} + \frac{1}{m_{\text{c}}})(\frac{d\log{(2n+2m_{\text{c}})}-\log(\del

Figures (3)

Figure 1: Illustration of the gradient vectors and their projections (the blue points denote $\mathbb{P}_{\text{in}}$, the green points represent $\mathbb{P}_{\text{out}}^{\text{covariate}}$, and the gray points indicate $\mathbb{P}_{\text{out}}^{\text{semantic}}$): (a) Visualization of the gradient projected onto the top singular vector of matrix $\mathbf{G}$ for unlabeled data. The gradients of the set $\mathbb{P}_{\text{in}}$ (inliers in the wild) are proximate to the origin (reference gradient $\bar{\nabla}$), in contrast to the gradients of the set $\mathbb{P}_{\text{out}}^{\text{semantic}}$, which are more distant. (b) The angle $\theta$ between the gradient of the set $\mathbb{P}_{\text{out}}^{\text{semantic}}$ and the singular vector $\mathbf{v}$. As $\mathbf{v}$ is identified to maximize the distance of the projected points (denoted by ✖) from the origin, considering the sum over all the gradients in $\mathbb{P}_{\text{wild}}$, $\mathbf{v}$ indicates the direction of OOD data in the wild with a small angle $\theta$.
Figure 2: Illustration of three selection criteria, (1) top-$k$ sampling, (2) near-boundary sampling, and (3) mixed sampling. The horizontal axis is the sampling score defined in Equation \ref{['eq:score']}, and the vertical axis is the frequency. Note that we color the three different sub-distributions (ID, covariate OOD, semantic OOD) separately for clarity, but in practice, the membership is not revealed due to the unlabeled nature of wild data.
Figure 3: (a)-(b) Score distributions for ERM vs. our method. Different colors represent the different types of test data: CIFAR-10 as $\mathbb{P}_{\text{in}}$ (blue), CIFAR-10-C as $\mathbb{P}_\text{out}^\text{covariate}$ (green), and Textures as $\mathbb{P}_\text{out}^\text{semantic}$ (gray). (c)-(d): T-SNE visualization of the image embeddings using ERM vs. our method.

Theorems & Definitions (18)

Definition 1
Theorem 1
Definition 2: $\beta$-smooth
Definition 3: $\mathcal{W}\triangle\mathcal{W}\text{-}$distance ben2010theory
Definition 4: Gradient-based Distribution Discrepancy
Remark 1
Theorem 2
Proposition 3
proof
Proposition 4
...and 8 more

Out-of-Distribution Learning with Human Feedback

TL;DR

Abstract

Out-of-Distribution Learning with Human Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (18)