Table of Contents
Fetching ...

Seeking Flat Minima with Mean Teacher on Semi- and Weakly-Supervised Domain Generalization for Object Detection

Ryosuke Furuta, Yoichi Sato

TL;DR

It is shown that object detectors can be effectively trained on the two settings with the same Mean Teacher learning framework, where a student network is trained with pseudo-labels output from a teacher on the unlabeled or weakly-labeled data.

Abstract

Object detectors do not work well when domains largely differ between training and testing data. To overcome this domain gap in object detection without requiring expensive annotations, we consider two problem settings: semi-supervised domain generalizable object detection (SS-DGOD) and weakly-supervised DGOD (WS-DGOD). In contrast to the conventional domain generalization for object detection that requires labeled data from multiple domains, SS-DGOD and WS-DGOD require labeled data only from one domain and unlabeled or weakly-labeled data from multiple domains for training. In this paper, we show that object detectors can be effectively trained on the two settings with the same Mean Teacher learning framework, where a student network is trained with pseudo-labels output from a teacher on the unlabeled or weakly-labeled data. We provide novel interpretations of why the Mean Teacher learning framework works well on the two settings in terms of the relationships between the generalization gap and flat minima in parameter space. On the basis of the interpretations, we also show that incorporating a simple regularization method into the Mean Teacher learning framework leads to flatter minima. The experimental results demonstrate that the regularization leads to flatter minima and boosts the performance of the detectors trained with the Mean Teacher learning framework on the two settings.

Seeking Flat Minima with Mean Teacher on Semi- and Weakly-Supervised Domain Generalization for Object Detection

TL;DR

It is shown that object detectors can be effectively trained on the two settings with the same Mean Teacher learning framework, where a student network is trained with pseudo-labels output from a teacher on the unlabeled or weakly-labeled data.

Abstract

Object detectors do not work well when domains largely differ between training and testing data. To overcome this domain gap in object detection without requiring expensive annotations, we consider two problem settings: semi-supervised domain generalizable object detection (SS-DGOD) and weakly-supervised DGOD (WS-DGOD). In contrast to the conventional domain generalization for object detection that requires labeled data from multiple domains, SS-DGOD and WS-DGOD require labeled data only from one domain and unlabeled or weakly-labeled data from multiple domains for training. In this paper, we show that object detectors can be effectively trained on the two settings with the same Mean Teacher learning framework, where a student network is trained with pseudo-labels output from a teacher on the unlabeled or weakly-labeled data. We provide novel interpretations of why the Mean Teacher learning framework works well on the two settings in terms of the relationships between the generalization gap and flat minima in parameter space. On the basis of the interpretations, we also show that incorporating a simple regularization method into the Mean Teacher learning framework leads to flatter minima. The experimental results demonstrate that the regularization leads to flatter minima and boosts the performance of the detectors trained with the Mean Teacher learning framework on the two settings.
Paper Structure (48 sections, 2 theorems, 6 equations, 22 figures, 13 tables)

This paper contains 48 sections, 2 theorems, 6 equations, 22 figures, 13 tables.

Key Result

Theorem 1

Consider a set of $N$ covers $\{\Theta_k\}_{k=1}^N$ such that the parameter space $\Theta \subset \cup_k^N\Theta_k$ where $\mathrm{diam}(\Theta)\vcentcolon= \mathrm{sup}_{\theta,\theta'\in\Theta}\|\theta-\theta'\|_2$, $N\vcentcolon= \lceil (\mathrm{diam}(\Theta)/\gamma)^d\rceil$ and $d$ is dimension where m is the number of training samples and $\mathrm{Div}(s_i,t)\vcentcolon=2\mathrm{sup}_A|\math

Figures (22)

  • Figure 1: Training framework.
  • Figure 2: Empirical and robust risks.
  • Figure 3: Intuitive interpretation of difference between loss values of trajectory of student and their mean (teacher).
  • Figure 4: Overview of regualization method.
  • Figure 5: Left and right plots compare average training and test flatness, respectively.
  • ...and 17 more figures

Theorems & Definitions (2)

  • Theorem : from cha2021swad
  • Proposition