HYPO: Hyperspherical Out-of-Distribution Generalization

Haoyue Bai; Yifei Ming; Julian Katz-Samuels; Yixuan Li

HYPO: Hyperspherical Out-of-Distribution Generalization

Haoyue Bai, Yifei Ming, Julian Katz-Samuels, Yixuan Li

TL;DR

A novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space is proposed that outperforms competitive baselines and achieves superior performance.

Abstract

Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires the ability to learn invariant features across different domains or environments. In this paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space. In particular, our hyperspherical learning algorithm is guided by intra-class variation and inter-class separation principles -- ensuring that features from the same class (across different training domains) are closely aligned with their class prototypes, while different class prototypes are maximally separated. We further provide theoretical justifications on how our prototypical learning objective improves the OOD generalization bound. Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. Code is available at https://github.com/deeplearning-wisc/hypo.

HYPO: Hyperspherical Out-of-Distribution Generalization

TL;DR

Abstract

Paper Structure (47 sections, 10 theorems, 54 equations, 7 figures, 15 tables, 1 algorithm)

This paper contains 47 sections, 10 theorems, 54 equations, 7 figures, 15 tables, 1 algorithm.

Introduction
Empirical contribution.
Theoretical insight.
Problem Setup
Motivation of Algorithm Design
Method
Hyperspherical Learning for OOD Generalization
Loss function.
Class prediction.
Geometrical Interpretation of Loss and Embedding
Experiments
Experimental Setup
Datasets.
Evaluation metrics.
Experimental details.
...and 32 more sections

Key Result

Theorem 3.1

Suppose the loss function $\ell(\cdot, \cdot)$ is bounded by $[0, {B}]$. For a learnable OOD generalization problem with sufficient inter-class separation, the OOD generalization error $\text{err}(f)$ can be upper bounded by for some $\alpha>0$, and $\mathcal{V}^{\textnormal{sup}}\left(h, \mathcal{E}_{\textnormal{avail }}\right) \triangleq \sup _{\beta \in \mathcal{S}^{d-1}} \mathcal{V}\left(\be

Figures (7)

Figure 1: Illustration of hyperspherical embeddings. Images are from PACS li2017deeper.
Figure 2: Our method HYPO significantly improves the OOD generalization performance compared to ERM on various OOD datasets w.r.t. CIFAR-10 (ID). Full results can be seen in Appendix \ref{['sec:c10_shift']}.
Figure 3: Illustration of hard negative pairs which share the same domain (art painting) but have different class labels.
Figure 4: UMAP mcinnes2018umap-software visualization of the features when the model is trained with CE vs. HYPO for PACS. The red, orange, and green points are from the in-distribution, which denote art painting (A), photo (P), and sketch (S). The violet points are from the unseen OOD domain cartoon (C).
Figure 5: Intra-class variation for ERM (left) vs. HYPO (right) on PACS. For each class $y$, we measure the Sinkhorn Divergence between the embeddings of each pair of domains. Our method results in significantly lower intra-class variation across different pairs of training domains compared to ERM.
...and 2 more figures

Theorems & Definitions (19)

Definition 2.1: OOD Generalization
Definition 3.1: Intra-class variation
Definition 3.2: Inter-class separation
Definition 3.3
Theorem 3.1: OOD error upper bound, informal ye2021towards
Theorem 6.1: Variation upper bound using HYPO
Lemma C.1
Remark 1
Lemma C.2
Lemma C.3
...and 9 more

HYPO: Hyperspherical Out-of-Distribution Generalization

TL;DR

Abstract

HYPO: Hyperspherical Out-of-Distribution Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (19)