Learning Conditional Invariances through Non-Commutativity

Abhra Chaudhuri; Serban Georgescu; Anjan Dutta

Learning Conditional Invariances through Non-Commutativity

Abhra Chaudhuri, Serban Georgescu, Anjan Dutta

TL;DR

This work tackles domain adaptation under asymmetry, where the target domain holds semantically valuable information not present in the source. It proposes Non-Commutative Invariance (NCI), a conditional invariance framework that directs the invariance operator toward the target domain, preserving target-relevant features while leveraging source-domain information as augmentations. The authors prove that NCI yields stricter target-risk bounds by driving the $\mathcal{H}_\eta$-divergence to zero and show that source samples can match the target encoder's sample complexity via Haussler-type bounds, effectively learning $\Phi^*_{\tau}$ with cross-domain data. Empirically, NCI achieves state-of-the-art results on PACS, Office-Home, and DomainNet and approaches oracle performance in multimodal segmentation, with gains amplified when source domains provide complementary semantic information. Overall, NCI offers a principled, efficient path to target-domain optimality by exploiting domain asymmetry and cross-domain semantics. $I(\cdot;\cdot)$ and $\mathcal{H}_\eta$-divergence play central roles in formalizing the bounds and the learning dynamics. $\varphi^*_{\tau}$ and $\Phi^*_{\tau}$ denote target-optimal encoders under different training regimes, illustrating the theoretical and practical advantages of directing invariances non-commutatively toward the target.

Abstract

Invariance learning algorithms that conditionally filter out domain-specific random variables as distractors, do so based only on the data semantics, and not the target domain under evaluation. We show that a provably optimal and sample-efficient way of learning conditional invariances is by relaxing the invariance criterion to be non-commutatively directed towards the target domain. Under domain asymmetry, i.e., when the target domain contains semantically relevant information absent in the source, the risk of the encoder $\varphi^*$ that is optimal on average across domains is strictly lower-bounded by the risk of the target-specific optimal encoder $Φ^*_τ$. We prove that non-commutativity steers the optimization towards $Φ^*_τ$ instead of $\varphi^*$, bringing the $\mathcal{H}$-divergence between domains down to zero, leading to a stricter bound on the target risk. Both our theory and experiments demonstrate that non-commutative invariance (NCI) can leverage source domain samples to meet the sample complexity needs of learning $Φ^*_τ$, surpassing SOTA invariance learning algorithms for domain adaptation, at times by over $2\%$, approaching the performance of an oracle. Implementation is available at https://github.com/abhrac/nci.

Learning Conditional Invariances through Non-Commutativity

TL;DR

-divergence to zero and show that source samples can match the target encoder's sample complexity via Haussler-type bounds, effectively learning

with cross-domain data. Empirically, NCI achieves state-of-the-art results on PACS, Office-Home, and DomainNet and approaches oracle performance in multimodal segmentation, with gains amplified when source domains provide complementary semantic information. Overall, NCI offers a principled, efficient path to target-domain optimality by exploiting domain asymmetry and cross-domain semantics.

and

-divergence play central roles in formalizing the bounds and the learning dynamics.

and

denote target-optimal encoders under different training regimes, illustrating the theoretical and practical advantages of directing invariances non-commutatively toward the target.

Abstract

that is optimal on average across domains is strictly lower-bounded by the risk of the target-specific optimal encoder

. We prove that non-commutativity steers the optimization towards

instead of

, bringing the

-divergence between domains down to zero, leading to a stricter bound on the target risk. Both our theory and experiments demonstrate that non-commutative invariance (NCI) can leverage source domain samples to meet the sample complexity needs of learning

, surpassing SOTA invariance learning algorithms for domain adaptation, at times by over

, approaching the performance of an oracle. Implementation is available at https://github.com/abhrac/nci.

Paper Structure (19 sections, 7 theorems, 55 equations, 3 figures, 4 tables)

This paper contains 19 sections, 7 theorems, 55 equations, 3 figures, 4 tables.

Introduction
Related Work
Non-Commutative Invariance
Preliminaries
Target Risk under Non-Commutativity
Sample Complexity of NCI and its Optimality
Training with NCI
Experiments
Conclusion
Appendix
Extended Literature Review
Notations
Additional Preliminaries
Metric Space of Domains
Operator-Encoder Duality
...and 4 more sections

Key Result

Theorem 1

Let the $\mathcal{H}_\eta$-divergence between the source ($s$) and the target ($\tau$) domains be $\Delta$, i.e., $s = \tau + \Delta$ (subsec:domain_metric). Under asymmetry (assump:asymmetry), i.e., $I(\Phi_s^*({\mathrm{\mathbf{x}}}_s); {\mathrm{\mathbf{y}}}) < I(\Phi_\tau^*({\mathrm{\mathbf{x}}}_\ where $l(\cdot, \cdot)$ calculates the encoding risk of a domain, and $\Phi^*_\tau(\cdot)$ is the o

Figures (3)

Figure 1: Dogs and giraffes may not be fully distinguishable in the free-hand sketch domain due to shared geometric structures. However, they are clearly separable in the photo domain based on color and texture. Thus, mapping photos and sketches to the same invariant representation loses out such critical domain-specific, but useful information when inference has to be performed in the photo domain at test time.
Figure 2: Abstract comparison of commutative and non-commutative invariance (NCI) learning. Commutative invariance aims to capture components that are shared across both the source $\mathcal{D}_s$ and target $\mathcal{D}_\tau$ domains (simplified in the diagram as $s$ and $\tau$ respectively) by mapping samples from one domain to the semantic space $\varphi(\cdot)$ of the other. NCI, on the other hand, captures components that the source domain shares with the target domain by mapping source samples to the target's representation space, but retains all the components in the semantic space of the target domain unchanged. NCI is thus invariant to the domain-specific, semantically relevant components of the source domain, but not the target domain.
Figure 3: Effects of increasing the number of semantically complementary source samples (horizontal axis) on classification accuracy (vertical axis) across different complementary source domains.

Theorems & Definitions (20)

Definition 1: Asymmetry
Definition 2: Commutative and Non-Commutative Invariance
Theorem 1
Theorem 2
Definition 3: Domain-Specific Optimality
Definition 4: Optimal-on-average
Definition 5: $\mathcal{H}_\eta$-divergence BenDavid2006HDivBenDavid2010DATheoryKifer2004DetectingCIGanin2016GradRevJMLR
Definition 6: $\mathcal{A}$-distance BenDavid2006HDivGanin2016GradRevJMLR
Theorem 3: Generalization Bound on the Target Risk BenDavid2006HDiv
proof
...and 10 more

Learning Conditional Invariances through Non-Commutativity

TL;DR

Abstract

Learning Conditional Invariances through Non-Commutativity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (20)