Table of Contents
Fetching ...

Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization

Xiran Wang, Jian Zhang, Lei Qi, Yinghuan Shi

TL;DR

This work addresses domain generalization under distribution shift by rethinking gradient matching. It introduces arithmetic meta-learning, which uses arithmetic-weighted gradients to move updates toward the centroid of domain-specific optima, rather than following a single gradient combination. The method achieves improved generalization on multiple DG benchmarks, with careful design choices such as SGD in the inner loop and Adam in the outer loop, and shows additional gains when combined with global averaging like SWAD. The approach is simple to implement and complements existing averaging strategies, offering a practical boost for robust performance across unseen domains.

Abstract

Domain generalization is proposed to address distribution shift, arising from statistical disparities between training source and unseen target domains. The widely used first-order meta-learning algorithms demonstrate strong performance for domain generalization by leveraging the gradient matching theory, which aims to establish balanced parameters across source domains to reduce overfitting to any particular domain. However, our analysis reveals that there are actually numerous directions to achieve gradient matching, with current methods representing just one possible path. These methods actually overlook another critical factor that the balanced parameters should be close to the centroid of optimal parameters of each source domain. To address this, we propose a simple yet effective arithmetic meta-learning with arithmetic-weighted gradients. This approach, while adhering to the principles of gradient matching, promotes a more precise balance by estimating the centroid between domain-specific optimal parameters. Experimental results validate the effectiveness of our strategy.

Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization

TL;DR

This work addresses domain generalization under distribution shift by rethinking gradient matching. It introduces arithmetic meta-learning, which uses arithmetic-weighted gradients to move updates toward the centroid of domain-specific optima, rather than following a single gradient combination. The method achieves improved generalization on multiple DG benchmarks, with careful design choices such as SGD in the inner loop and Adam in the outer loop, and shows additional gains when combined with global averaging like SWAD. The approach is simple to implement and complements existing averaging strategies, offering a practical boost for robust performance across unseen domains.

Abstract

Domain generalization is proposed to address distribution shift, arising from statistical disparities between training source and unseen target domains. The widely used first-order meta-learning algorithms demonstrate strong performance for domain generalization by leveraging the gradient matching theory, which aims to establish balanced parameters across source domains to reduce overfitting to any particular domain. However, our analysis reveals that there are actually numerous directions to achieve gradient matching, with current methods representing just one possible path. These methods actually overlook another critical factor that the balanced parameters should be close to the centroid of optimal parameters of each source domain. To address this, we propose a simple yet effective arithmetic meta-learning with arithmetic-weighted gradients. This approach, while adhering to the principles of gradient matching, promotes a more precise balance by estimating the centroid between domain-specific optimal parameters. Experimental results validate the effectiveness of our strategy.

Paper Structure

This paper contains 17 sections, 13 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: In the ternary parameter space $(x, y, z)$, gradient matching can be deduced in any directions of the yellow surface. The black arrow is the updating direction of existing methods, which moves away from the optimal solution of domain 1.
  • Figure 2: Comparison of different learning strategies. Each step of the inner loop corresponds to a distinct domain, while in the outer loop, the gradient is computed as the weighted average of those from the inner loop, with the values above representing their respective weights.
  • Figure 3: Loss surface plots of various domains on the PACS dataset, where the deeper is better. The yellow triangle in (a)(b)(c)(e)(f)(g) shows the estimated optimal parameters from the respective source domain, while the black triangle represents the estimated optimal parameters for the other source domains. The red arrow in (d)(h) is the updating direction of previous methods, while the yellow arrow towards the centroid marks the update direction of arithmetic meta-learning.
  • Figure 4: Adam optimizer's gradient distribution over the first fifteen steps of inner loop. Three domains are alternately optimized at each step. As momentum builds, the gradient contributions from each domain converge to similar proportions in the later stages.
  • Figure 5: Accuracy (%) on PACS dataset with the varying number of steps for each domain during the inner loop .
  • ...and 1 more figures