Table of Contents
Fetching ...

HYPO: Hyperspherical Out-of-Distribution Generalization

Haoyue Bai, Yifei Ming, Julian Katz-Samuels, Yixuan Li

TL;DR

A novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space is proposed that outperforms competitive baselines and achieves superior performance.

Abstract

Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires the ability to learn invariant features across different domains or environments. In this paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space. In particular, our hyperspherical learning algorithm is guided by intra-class variation and inter-class separation principles -- ensuring that features from the same class (across different training domains) are closely aligned with their class prototypes, while different class prototypes are maximally separated. We further provide theoretical justifications on how our prototypical learning objective improves the OOD generalization bound. Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. Code is available at https://github.com/deeplearning-wisc/hypo.

HYPO: Hyperspherical Out-of-Distribution Generalization

TL;DR

A novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space is proposed that outperforms competitive baselines and achieves superior performance.

Abstract

Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires the ability to learn invariant features across different domains or environments. In this paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space. In particular, our hyperspherical learning algorithm is guided by intra-class variation and inter-class separation principles -- ensuring that features from the same class (across different training domains) are closely aligned with their class prototypes, while different class prototypes are maximally separated. We further provide theoretical justifications on how our prototypical learning objective improves the OOD generalization bound. Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. Code is available at https://github.com/deeplearning-wisc/hypo.
Paper Structure (47 sections, 10 theorems, 54 equations, 7 figures, 15 tables, 1 algorithm)

This paper contains 47 sections, 10 theorems, 54 equations, 7 figures, 15 tables, 1 algorithm.

Key Result

Theorem 3.1

Suppose the loss function $\ell(\cdot, \cdot)$ is bounded by $[0, {B}]$. For a learnable OOD generalization problem with sufficient inter-class separation, the OOD generalization error $\text{err}(f)$ can be upper bounded by for some $\alpha>0$, and $\mathcal{V}^{\textnormal{sup}}\left(h, \mathcal{E}_{\textnormal{avail }}\right) \triangleq \sup _{\beta \in \mathcal{S}^{d-1}} \mathcal{V}\left(\be

Figures (7)

  • Figure 1: Illustration of hyperspherical embeddings. Images are from PACS li2017deeper.
  • Figure 2: Our method HYPO significantly improves the OOD generalization performance compared to ERM on various OOD datasets w.r.t. CIFAR-10 (ID). Full results can be seen in Appendix \ref{['sec:c10_shift']}.
  • Figure 3: Illustration of hard negative pairs which share the same domain (art painting) but have different class labels.
  • Figure 4: UMAP mcinnes2018umap-software visualization of the features when the model is trained with CE vs. HYPO for PACS. The red, orange, and green points are from the in-distribution, which denote art painting (A), photo (P), and sketch (S). The violet points are from the unseen OOD domain cartoon (C).
  • Figure 5: Intra-class variation for ERM (left) vs. HYPO (right) on PACS. For each class $y$, we measure the Sinkhorn Divergence between the embeddings of each pair of domains. Our method results in significantly lower intra-class variation across different pairs of training domains compared to ERM.
  • ...and 2 more figures

Theorems & Definitions (19)

  • Definition 2.1: OOD Generalization
  • Definition 3.1: Intra-class variation
  • Definition 3.2: Inter-class separation
  • Definition 3.3
  • Theorem 3.1: OOD error upper bound, informal ye2021towards
  • Theorem 6.1: Variation upper bound using HYPO
  • Lemma C.1
  • Remark 1
  • Lemma C.2
  • Lemma C.3
  • ...and 9 more