Domain Agnostic Conditional Invariant Predictions for Domain Generalization

Zongbin Wang; Bin Pan; Zhenwei Shi

Domain Agnostic Conditional Invariant Predictions for Domain Generalization

Zongbin Wang, Bin Pan, Zhenwei Shi

TL;DR

This paper tackles domain generalization without domain labels by introducing Discriminant Risk Minimization (DRM), which associates stability of the model’s prediction distribution with invariant feature use. The authors prove an upper bound showing that lowering source Discriminant Risk can reduce target risk, and instantiate DRM with a Bayesian last layer and a Categorical Discriminant Risk (CDR) penalty to encourage invariance. The resulting algorithm combines variational inference, a reparameterization trick, and a sliding Discriminant matrix to approximate distributional differences across source subsets, optimizing $\mathbb{E}_{q(b)}(R_{D_s}(Y,b\cdot\phi(X)))$ with a JS-based consistency term. Empirically, DRM achieves strong, robust performance on PACS, VLCS, and Office-Home without requiring domain labels, often outperforming domain-label baselines and displaying reduced variance, with implications for both generalization and fairness in real-world deployments.

Abstract

Domain generalization aims to develop a model that can perform well on unseen target domains by learning from multiple source domains. However, recent-proposed domain generalization models usually rely on domain labels, which may not be available in many real-world scenarios. To address this challenge, we propose a Discriminant Risk Minimization (DRM) theory and the corresponding algorithm to capture the invariant features without domain labels. In DRM theory, we prove that reducing the discrepancy of prediction distribution between overall source domain and any subset of it can contribute to obtaining invariant features. To apply the DRM theory, we develop an algorithm which is composed of Bayesian inference and a new penalty termed as Categorical Discriminant Risk (CDR). In Bayesian inference, we transform the output of the model into a probability distribution to align with our theoretical assumptions. We adopt sliding update approach to approximate the overall prediction distribution of the model, which enables us to obtain CDR penalty. We also indicate the effectiveness of these components in finding invariant features. We evaluate our algorithm against various domain generalization methods on multiple real-world datasets, providing empirical support for our theory.

Domain Agnostic Conditional Invariant Predictions for Domain Generalization

TL;DR

with a JS-based consistency term. Empirically, DRM achieves strong, robust performance on PACS, VLCS, and Office-Home without requiring domain labels, often outperforming domain-label baselines and displaying reduced variance, with implications for both generalization and fairness in real-world deployments.

Abstract

Paper Structure (17 sections, 6 theorems, 45 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 6 theorems, 45 equations, 2 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Discriminant Risk Minimization
Problem setting
Discriminant Risk for domain generalization
Empirical experiment
Algorithms for Discriminant Risk Minimization
Variational Inference
Reparameterization
Categorical Discriminant Risk
Experiments
Real-world datasets
Baseline
Experiment Setting
Results
...and 2 more sections

Key Result

Lemma 1

For $\forall D_i,D_j\subset D_s$, for simplicity, let $P_{D_s^i}^f(\hat{y})$ represent the $P_{D_s^i}(\hat{y}|q,\phi), f=q\cdot\phi$, we have:

Figures (2)

Figure 1: We performed the aforementioned experiments on resnet-18 using default parameters. The confusion matrix plots on each source domain after 5000 batches on VLCS are displayed on the left side of the image. The number of images in each category is close to one another in each domain. It is apparent that the model exhibits considerable disparities in the confusion matrix for each source domain, even after prolonged training. On the right side, the confusion matrix plots for the same model is trained after 300 batches on PACS. We can observe that the model prediction distribution also exhibits substantial variation among domains due to the dependence on spurious features in the early training stage.
Figure 2: Our model consists of two distinct parts for updating, which are separated by a blue dashed line. The left-hand side is the prediction process, where VI approximates the posterior distribution $p(b|\mathcal{D}_{s})$ using $q(b)$ to update the Bayesian layer, as shown by the purple dashed backpropagation arrow. The right-hand side is the Discriminant matrix update process, where the Discriminant matrix is used to compare with the model prediction to obtain the CDR. Finally, the feature extractor is updated together with the CDR and likelihood loss.

Theorems & Definitions (6)

Lemma 1
Lemma 2
Theorem 1
Theorem 2
Theorem 3
Theorem 4

Domain Agnostic Conditional Invariant Predictions for Domain Generalization

TL;DR

Abstract

Domain Agnostic Conditional Invariant Predictions for Domain Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (6)