Continuous Invariance Learning

Yong Lin; Fan Zhou; Lu Tan; Lintao Ma; Jiameng Liu; Yansu He; Yuan Yuan; Yu Liu; James Zhang; Yujiu Yang; Hao Wang

Continuous Invariance Learning

Yong Lin, Fan Zhou, Lu Tan, Lintao Ma, Jiameng Liu, Yansu He, Yuan Yuan, Yu Liu, James Zhang, Yujiu Yang, Hao Wang

TL;DR

Continuous Invariance Learning (CIL) addresses out-of-distribution generalization under continuous domain shifts by extending invariance learning beyond discrete domains. Unlike prior IRM-based methods that align $ oldsymbol{E}^t[oldsymbol{y}|oldsymbol{c Phi(x)}]$, CIL aligns $ oldsymbol{E}^y[oldsymbol{t}|oldsymbol{c Phi(x)}]$ via a minimax objective that uses two domain regressors to predict the continuous domain index from invariant features and from invariant-plus-label features. Theoretical results show CIL avoids the finite-sample pitfalls of existing methods and can recover invariant features even with many domains, while empirical results across synthetic CMNIST and real-world tasks (HousePrice, Insurance Fraud, Alipay Auto-scaling, WildTime-YearBook) demonstrate consistent improvements over strong baselines. The findings suggest CIL provides a practical, robust approach for OOD generalization in settings with continuously indexed domains, with potential applications in time-based and production-system data streams.

Abstract

Invariance learning methods aim to learn invariant features in the hope that they generalize under distributional shifts. Although many tasks are naturally characterized by continuous domains, current invariance learning techniques generally assume categorically indexed domains. For example, auto-scaling in cloud computing often needs a CPU utilization prediction model that generalizes across different times (e.g., time of a day and date of a year), where `time' is a continuous domain index. In this paper, we start by theoretically showing that existing invariance learning methods can fail for continuous domain problems. Specifically, the naive solution of splitting continuous domains into discrete ones ignores the underlying relationship among domains, and therefore potentially leads to suboptimal performance. To address this challenge, we then propose Continuous Invariance Learning (CIL), which extracts invariant features across continuously indexed domains. CIL is a novel adversarial procedure that measures and controls the conditional independence between the labels and continuous domain indices given the extracted features. Our theoretical analysis demonstrates the superiority of CIL over existing invariance learning methods. Empirical results on both synthetic and real-world datasets (including data collected from production systems) show that CIL consistently outperforms strong baselines among all the tasks.

Continuous Invariance Learning

TL;DR

, CIL aligns

via a minimax objective that uses two domain regressors to predict the continuous domain index from invariant features and from invariant-plus-label features. Theoretical results show CIL avoids the finite-sample pitfalls of existing methods and can recover invariant features even with many domains, while empirical results across synthetic CMNIST and real-world tasks (HousePrice, Insurance Fraud, Alipay Auto-scaling, WildTime-YearBook) demonstrate consistent improvements over strong baselines. The findings suggest CIL provides a practical, robust approach for OOD generalization in settings with continuously indexed domains, with potential applications in time-based and production-system data streams.

Abstract

Paper Structure (35 sections, 5 theorems, 30 equations, 8 figures, 14 tables, 1 algorithm)

This paper contains 35 sections, 5 theorems, 30 equations, 8 figures, 14 tables, 1 algorithm.

Introduction
Difficulty of Existing Methods in Continuous Domain Tasks
Preliminaries
The Risks of Existing Methods on Continuous Domain Tasks
Our Method
Theoretical Analysis of Continuous Invariance Learning
Experiments
Synthetic Datasets
Continuous CMNIST
Real-World Datasets
HousePrice
Insurance Fraud
Alipay Auto-scaling
WildTime-YearBook
Conclusion and Discussion
...and 20 more sections

Key Result

Proposition 1

If $\frac{n}{|\mathcal{T}|} \xrightarrow[]{} \infty$, with probability approaching 1, we have $\mathbb{E}[\hat{\mathcal{L}}(\Phi_v)] < \mathbb{E}[\hat{\mathcal{L}}(\Phi_s)]$, where the expectation is taken over the random draw of the domain and each sample given $\sigma_\Phi$. However, if the domain

Figures (8)

Figure 1: Illustration of distributional shifts in discrete and continuous domains Wang2020ContinuouslyID. Existing IRM methods focus on discrete domains, which is inconsistent with many real-world tasks. Our work therefore aims to extend IRM to continuous domains.
Figure 2: Empirical validation of how the performance of IRM deteriorates with the number of domains while the total sample size is fixed. The experiments are conducted on CMNIST arjovsky2019invariant with 50,000 samples. We equally split the original 2 domains into more domains. Since CMNIST only contains 2 classes, 50% test accuracy is close to random guessing. Notably, the data with continuous domains can contain an infinite number of domains with only one sample in each domain.
Figure 3: An illustration Yearbook yao2022wild. Images taken from yao2022wild.
Figure 4: The accuracy on the worst test OOD domain of each method on Yearbook dataset on Wild-time. The performance of baseline methods is copied from yao2022wild.
Figure 5: An illustration of the invariance property in causality (similar to Figure 1 in ICP peters2016causal). This figure shows a causal system with five nodes, $y$, $x_1$, $x_2$, $x_3$, and $x_4$. Our task is to predict $y$ based on the $x'$s. There are different interventions in different domains, leading to distributional shifts. Intervention on a node can be simply interpreted as changing the node value. The changes can propagate to the descendants of the intervened node. The invariance property shows that $P(y|x_2, x_3)$ remains the same in all three domains. In contrast, $P(y|x_1)$ and $P(y|x_4)$ changes in domain 2 and 3 due to interventions, respectively. So it is safe to build model on $x_2$ and $x_3$ to predict $y$, which is expected to be stable under novel testing distribution.
...and 3 more figures

Theorems & Definitions (8)

Proposition 1
Lemma 1
Theorem 2
Proposition 2
proof
proof
Theorem 3
proof

Continuous Invariance Learning

TL;DR

Abstract

Continuous Invariance Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (8)