Table of Contents
Fetching ...

Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution

Wang Lu, Jindong Wang, Yidong Wang, Xing Xie

TL;DR

This paper proposes Mixup guided optimization and selection techniques for domain generalization (DG) and utilizes an adapted Mixup to generate an out-of-distribution dataset that can guide the preference direction and optimize with Pareto optimization.

Abstract

The distribution shifts between training and test data typically undermine the performance of models. In recent years, lots of work pays attention to domain generalization (DG) where distribution shifts exist, and target data are unseen. Despite the progress in algorithm design, two foundational factors have long been ignored: 1) the optimization for regularization-based objectives, and 2) the model selection for DG since no knowledge about the target domain can be utilized. In this paper, we propose Mixup guided optimization and selection techniques for DG. For optimization, we utilize an adapted Mixup to generate an out-of-distribution dataset that can guide the preference direction and optimize with Pareto optimization. For model selection, we generate a validation dataset with a closer distance to the target distribution, and thereby it can better represent the target data. We also present some theoretical insights behind our proposals. Comprehensive experiments demonstrate that our model optimization and selection techniques can largely improve the performance of existing domain generalization algorithms and even achieve new state-of-the-art results.

Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution

TL;DR

This paper proposes Mixup guided optimization and selection techniques for domain generalization (DG) and utilizes an adapted Mixup to generate an out-of-distribution dataset that can guide the preference direction and optimize with Pareto optimization.

Abstract

The distribution shifts between training and test data typically undermine the performance of models. In recent years, lots of work pays attention to domain generalization (DG) where distribution shifts exist, and target data are unseen. Despite the progress in algorithm design, two foundational factors have long been ignored: 1) the optimization for regularization-based objectives, and 2) the model selection for DG since no knowledge about the target domain can be utilized. In this paper, we propose Mixup guided optimization and selection techniques for DG. For optimization, we utilize an adapted Mixup to generate an out-of-distribution dataset that can guide the preference direction and optimize with Pareto optimization. For model selection, we generate a validation dataset with a closer distance to the target distribution, and thereby it can better represent the target data. We also present some theoretical insights behind our proposals. Comprehensive experiments demonstrate that our model optimization and selection techniques can largely improve the performance of existing domain generalization algorithms and even achieve new state-of-the-art results.
Paper Structure (40 sections, 5 theorems, 21 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 40 sections, 5 theorems, 21 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\boldsymbol{\omega}^*$ be the solution of the problem in Eq. eqa:epo, and $\mathbf{d}^* = \mathbf{G}\boldsymbol{\omega}^*$ be the resulted update direction. If $\ell_{optd} = 0$, then the dominating direction $\mathbf{d}^*$ becomes a descent direction, i.e., On the other hand, if $\ell_{optd}>0$, let $\gamma^* = (\mathbf{d}^*)^T\mathbf{g}_{optd}$ be the objective value of the problem in Eq.

Figures (4)

  • Figure 1: The framework.
  • Figure 2: Toy examples of different validation datasets. (a) Case I, the target is the convex combination of the sources. As we can see from \ref{['fig:case1']}, the distance between the original validation and the target is fixed while the target can be seen as part of VALD (the yellow part). VALD can even serve as an unbiased estimation of the target, which means it can get a better estimation of the target. (b) Case II, the target is out of the convex combination of the sources. The distance between the target and VALD is still smaller than the fixed distance between the target and origin validation data.
  • Figure 3: Ablation study on DSADS and USC-HAD.
  • Figure 4: Parameter sensitivity on DSADS.

Theorems & Definitions (9)

  • Definition 4.1: Pareto dominance
  • Definition 4.2: Pareto optimality
  • Definition 4.3: Pareto front
  • Definition 4.4: Preference vector
  • Theorem 4.1: Theorem 1 in lv2021pareto
  • Proposition 4.1
  • Theorem 4.2
  • Theorem 1.1
  • Proposition A .1