Table of Contents
Fetching ...

Enhancing Evolving Domain Generalization through Dynamic Latent Representations

Binghui Xie, Yongqiang Chen, Jiaqi Wang, Kaiwen Zhou, Bo Han, Wei Meng, James Cheng

TL;DR

This work tackles evolving domain generalization (EDG) by arguing that both invariant and dynamic latent features are necessary to generalize to time-evolving domains. The authors introduce Mutual Information-Based Sequential Autoencoders (MISTS), which jointly learn $z_c$ (invariant) and $z_t$ (dynamic) representations under MI constraints and couple them with a domain-adaptive classifier that evolves with the domain index $w_t$. The method is grounded in a probabilistic generative model and an ELBO-like objective that includes mutual-information penalties to enforce disentanglement, with theoretical guarantees linking the objective to data log-likelihood. Empirical results on synthetic and real-world EDG benchmarks show MISTS achieving superior average accuracy and demonstrating the importance of modeling both feature types, supported by comprehensive ablations and qualitative analyses.

Abstract

Domain generalization is a critical challenge for machine learning systems. Prior domain generalization methods focus on extracting domain-invariant features across several stationary domains to enable generalization to new domains. However, in non-stationary tasks where new domains evolve in an underlying continuous structure, such as time, merely extracting the invariant features is insufficient for generalization to the evolving new domains. Nevertheless, it is non-trivial to learn both evolving and invariant features within a single model due to their conflicts. To bridge this gap, we build causal models to characterize the distribution shifts concerning the two patterns, and propose to learn both dynamic and invariant features via a new framework called Mutual Information-Based Sequential Autoencoders (MISTS). MISTS adopts information theoretic constraints onto sequential autoencoders to disentangle the dynamic and invariant features, and leverage a domain adaptive classifier to make predictions based on both evolving and invariant information. Our experimental results on both synthetic and real-world datasets demonstrate that MISTS succeeds in capturing both evolving and invariant information, and present promising results in evolving domain generalization tasks.

Enhancing Evolving Domain Generalization through Dynamic Latent Representations

TL;DR

This work tackles evolving domain generalization (EDG) by arguing that both invariant and dynamic latent features are necessary to generalize to time-evolving domains. The authors introduce Mutual Information-Based Sequential Autoencoders (MISTS), which jointly learn (invariant) and (dynamic) representations under MI constraints and couple them with a domain-adaptive classifier that evolves with the domain index . The method is grounded in a probabilistic generative model and an ELBO-like objective that includes mutual-information penalties to enforce disentanglement, with theoretical guarantees linking the objective to data log-likelihood. Empirical results on synthetic and real-world EDG benchmarks show MISTS achieving superior average accuracy and demonstrating the importance of modeling both feature types, supported by comprehensive ablations and qualitative analyses.

Abstract

Domain generalization is a critical challenge for machine learning systems. Prior domain generalization methods focus on extracting domain-invariant features across several stationary domains to enable generalization to new domains. However, in non-stationary tasks where new domains evolve in an underlying continuous structure, such as time, merely extracting the invariant features is insufficient for generalization to the evolving new domains. Nevertheless, it is non-trivial to learn both evolving and invariant features within a single model due to their conflicts. To bridge this gap, we build causal models to characterize the distribution shifts concerning the two patterns, and propose to learn both dynamic and invariant features via a new framework called Mutual Information-Based Sequential Autoencoders (MISTS). MISTS adopts information theoretic constraints onto sequential autoencoders to disentangle the dynamic and invariant features, and leverage a domain adaptive classifier to make predictions based on both evolving and invariant information. Our experimental results on both synthetic and real-world datasets demonstrate that MISTS succeeds in capturing both evolving and invariant information, and present promising results in evolving domain generalization tasks.
Paper Structure (30 sections, 4 theorems, 44 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 4 theorems, 44 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Theorem 3.1

(Informal) In the linear setting of Eq. eq:problem, for any domain $t$, there exists a classifier $w_t$ acting on $z_c$ and $z_t$ that achieves a lower risk than the optimal classifier $w_*$ acting on $[z_c,0]$.

Figures (4)

  • Figure 1: An example of Evolving Domain Generalization on Portraitsginosar2015century.The dataset consists of historical images of US high school students, and as time progresses, the visual attributes captured in the photos, such as hair type and clothing style, gradually change.
  • Figure 2: The directed acyclic graph depicting our generative model. Dashed lines indicate the causal direction is possible for either side.
  • Figure 3: The MISTS framework starts by encoding the input data into the latent space, where two distinct LSTMs parameterize the corresponding posteriors to obtain the invariant and dynamic latent representations, denoted as $z_c$ and $z_{1:T}$, respectively. These representations are then passed through a decoder and an adaptive classifier to compute the reconstruction loss and classification loss, respectively. To encourage better disentanglement, mutual information (MI) terms are applied to the invariant, dynamic latent variables, and input data. The KL-divergence terms are omitted for simplicity.
  • Figure 4: The visualization presents the decision boundaries for the Sine and Sine-C datasets. In the Sine dataset's ground truth, positive and negative labels are denoted by green and yellow dots, respectively. Figures (b-d) illustrate the prediction results on the Sine dataset, obtained through the ERM, LSSAE, and MISTS methods, respectively.

Theorems & Definitions (6)

  • Theorem 3.1
  • Theorem 4.1
  • Proposition 4.2
  • Definition 8.1
  • Theorem 8.2
  • proof