Gradient Matching for Domain Generalization
Yuge Shi, Jeffrey Seely, Philip H. S. Torr, N. Siddharth, Awni Hannun, Nicolas Usunier, Gabriel Synnaeve
TL;DR
This work tackles domain generalization by proposing Inter-Domain Gradient Matching (IDGM), which encourages invariant input-output mappings by maximizing the gradient inner product across source domains. To avoid expensive second-order optimization, they introduce Fish, a first-order meta-learning–style algorithm that approximates IDGM and scales to multi-domain settings. Empirically, Fish delivers competitive or state-of-the-art performance on the Wilds and DomainBed benchmarks across vision and language tasks and survives across diverse architectures, while providing clear improvements in gradient alignment over standard ERM. The approach offers a practical, scalable mechanism to reduce reliance on domain-specific spurious correlations and promote robust generalization in real-world deployment.
Abstract
Machine learning systems typically assume that the distributions of training and test sets match closely. However, a critical requirement of such systems in the real world is their ability to generalize to unseen domains. Here, we propose an inter-domain gradient matching objective that targets domain generalization by maximizing the inner product between gradients from different domains. Since direct optimization of the gradient inner product can be computationally prohibitive -- requires computation of second-order derivatives -- we derive a simpler first-order algorithm named Fish that approximates its optimization. We demonstrate the efficacy of Fish on 6 datasets from the Wilds benchmark, which captures distribution shift across a diverse range of modalities. Our method produces competitive results on these datasets and surpasses all baselines on 4 of them. We perform experiments on both the Wilds benchmark, which captures distribution shift in the real world, as well as datasets in DomainBed benchmark that focuses more on synthetic-to-real transfer. Our method produces competitive results on both benchmarks, demonstrating its effectiveness across a wide range of domain generalization tasks.
