Table of Contents
Fetching ...

Domain Generalization via Discrete Codebook Learning

Shaocong Long, Qianyu Zhou, Xikun Jiang, Chenhao Ying, Lizhuang Ma, Yuan Luo

TL;DR

This work tackles domain generalization by shifting from pixel-level continuous representations to semantic-level discrete representations. It establishes a theoretical basis showing discretization can reduce domain gaps and presents Discrete Domain Generalization (DDG), which quantizes encoder features into a learnable codebook and trains with a teacher–student framework and EMA updates. The method combines classification, consistency, and codebook-related losses, and experiments across PACS, TerraIncognita, and VLCS demonstrate consistent improvements over SOTA with strong generalization and stability. Overall, DDG offers a principled, efficient route to robust DG by prioritizing semantic information through discrete representations.

Abstract

Domain generalization (DG) strives to address distribution shifts across diverse environments to enhance model's generalizability. Current DG approaches are confined to acquiring robust representations with continuous features, specifically training at the pixel level. However, this DG paradigm may struggle to mitigate distribution gaps in dealing with a large space of continuous features, rendering it susceptible to pixel details that exhibit spurious correlations or noise. In this paper, we first theoretically demonstrate that the domain gaps in continuous representation learning can be reduced by the discretization process. Based on this inspiring finding, we introduce a novel learning paradigm for DG, termed Discrete Domain Generalization (DDG). DDG proposes to use a codebook to quantize the feature map into discrete codewords, aligning semantic-equivalent information in a shared discrete representation space that prioritizes semantic-level information over pixel-level intricacies. By learning at the semantic level, DDG diminishes the number of latent features, optimizing the utilization of the representation space and alleviating the risks associated with the wide-ranging space of continuous features. Extensive experiments across widely employed benchmarks in DG demonstrate DDG's superior performance compared to state-of-the-art approaches, underscoring its potential to reduce the distribution gaps and enhance the model's generalizability.

Domain Generalization via Discrete Codebook Learning

TL;DR

This work tackles domain generalization by shifting from pixel-level continuous representations to semantic-level discrete representations. It establishes a theoretical basis showing discretization can reduce domain gaps and presents Discrete Domain Generalization (DDG), which quantizes encoder features into a learnable codebook and trains with a teacher–student framework and EMA updates. The method combines classification, consistency, and codebook-related losses, and experiments across PACS, TerraIncognita, and VLCS demonstrate consistent improvements over SOTA with strong generalization and stability. Overall, DDG offers a principled, efficient route to robust DG by prioritizing semantic information through discrete representations.

Abstract

Domain generalization (DG) strives to address distribution shifts across diverse environments to enhance model's generalizability. Current DG approaches are confined to acquiring robust representations with continuous features, specifically training at the pixel level. However, this DG paradigm may struggle to mitigate distribution gaps in dealing with a large space of continuous features, rendering it susceptible to pixel details that exhibit spurious correlations or noise. In this paper, we first theoretically demonstrate that the domain gaps in continuous representation learning can be reduced by the discretization process. Based on this inspiring finding, we introduce a novel learning paradigm for DG, termed Discrete Domain Generalization (DDG). DDG proposes to use a codebook to quantize the feature map into discrete codewords, aligning semantic-equivalent information in a shared discrete representation space that prioritizes semantic-level information over pixel-level intricacies. By learning at the semantic level, DDG diminishes the number of latent features, optimizing the utilization of the representation space and alleviating the risks associated with the wide-ranging space of continuous features. Extensive experiments across widely employed benchmarks in DG demonstrate DDG's superior performance compared to state-of-the-art approaches, underscoring its potential to reduce the distribution gaps and enhance the model's generalizability.

Paper Structure

This paper contains 13 sections, 1 theorem, 7 equations, 5 figures, 8 tables.

Key Result

Theorem 1

Let $F$ denote a family of functions $f: X \rightarrow \mathbb{R}$. For two domains characterized by continuous representation distributions $P$ and $Q$ over $X$ respectively, denote the type-1 Wasserstein distance sriperumbudur2012empirical as $\mathcal{W}(P, Q) = \sup_{f \in F} \int |P(x)f(x) - Q(

Figures (5)

  • Figure 1: (a) Existing DG methods rely on continuous representation learning, struggling with domain gaps due to large feature spaces, pixel perturbations, and interpretation. (b) We introduce a discrete representation codebook to map features into discrete codewords, prioritizing semantic information over imperceptible pixel details, aiding distribution alignment across domains. (c) The discretization of continuous features in our DDG. The numbers in image patches denote codeword indices. Key semantic patches in images from diverse domains ('Cartoon' and 'Sketch') are replaced with the same codeword (e.g., codeword 201 for long neck, codeword 255 for long legs). (d) Compared to state-of-the-art DG methods, our DDG significantly improves the model's generalizability.
  • Figure 2: Framework of our Discrete Domain Generalization (DDG). The approach uses a discrete representation codebook across domains to discretize feature maps into codewords, with predictions made by the classifier based on quantized features. The discrete codewords are chosen to replace latent variables based on their proximity. The Exponential Moving Average (EMA) of original representations is employed to optimize the codebook for heightened robustness.
  • Figure 3: Visualizations illustrating the semantics of learned codewords in our DDG. The labels within the patches indicate the index of the codeword within the codebook. In our proposed DDG, patches in the feature maps are substituted with the corresponding codewords according to their respective indices.
  • Figure 4: Visualization with t-SNE embeddings drawing features from the source and target domains before and after employing our DDG.
  • Figure 5: Visualization with t-SNE embeddings depicting features from different classes before and after the application of the proposed DDG.

Theorems & Definitions (2)

  • Theorem 1
  • Proof 1