Table of Contents
Fetching ...

G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection

Fan Wu, Jinling Gao, Lanqing Hong, Xinbing Wang, Chenghu Zhou, Nanyang Ye

TL;DR

This work tackles Single Domain Generalization Object Detection (S-DGOD) by introducing G-NAS, a Differentiable Neural Architecture Search framework guided by a Generalizable loss ($L_g$) to prevent overfitting to easy, non-causal features. By optimizing both network weights and architectural choices under an OoD-aware objective, G-NAS identifies prediction-head architectures that generalize across unseen weather and lighting domains without access to target-domain data. Empirical results on urban-scene datasets show G-NAS achieves state-of-the-art generalization across multiple target domains, with notable gains on challenging Night and Fog conditions, and robust performance across per-class APs. Ablation studies confirm the contribution of NAS and $L_g$ to improved OoD generalization, and visualization suggests more domain-invariant, causally relevant representations emerge when using G-loss.

Abstract

In this paper, we focus on a realistic yet challenging task, Single Domain Generalization Object Detection (S-DGOD), where only one source domain's data can be used for training object detectors, but have to generalize multiple distinct target domains. In S-DGOD, both high-capacity fitting and generalization abilities are needed due to the task's complexity. Differentiable Neural Architecture Search (NAS) is known for its high capacity for complex data fitting and we propose to leverage Differentiable NAS to solve S-DGOD. However, it may confront severe over-fitting issues due to the feature imbalance phenomenon, where parameters optimized by gradient descent are biased to learn from the easy-to-learn features, which are usually non-causal and spuriously correlated to ground truth labels, such as the features of background in object detection data. Consequently, this leads to serious performance degradation, especially in generalizing to unseen target domains with huge domain gaps between the source domain and target domains. To address this issue, we propose the Generalizable loss (G-loss), which is an OoD-aware objective, preventing NAS from over-fitting by using gradient descent to optimize parameters not only on a subset of easy-to-learn features but also the remaining predictive features for generalization, and the overall framework is named G-NAS. Experimental results on the S-DGOD urban-scene datasets demonstrate that the proposed G-NAS achieves SOTA performance compared to baseline methods. Codes are available at https://github.com/wufan-cse/G-NAS.

G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection

TL;DR

This work tackles Single Domain Generalization Object Detection (S-DGOD) by introducing G-NAS, a Differentiable Neural Architecture Search framework guided by a Generalizable loss () to prevent overfitting to easy, non-causal features. By optimizing both network weights and architectural choices under an OoD-aware objective, G-NAS identifies prediction-head architectures that generalize across unseen weather and lighting domains without access to target-domain data. Empirical results on urban-scene datasets show G-NAS achieves state-of-the-art generalization across multiple target domains, with notable gains on challenging Night and Fog conditions, and robust performance across per-class APs. Ablation studies confirm the contribution of NAS and to improved OoD generalization, and visualization suggests more domain-invariant, causally relevant representations emerge when using G-loss.

Abstract

In this paper, we focus on a realistic yet challenging task, Single Domain Generalization Object Detection (S-DGOD), where only one source domain's data can be used for training object detectors, but have to generalize multiple distinct target domains. In S-DGOD, both high-capacity fitting and generalization abilities are needed due to the task's complexity. Differentiable Neural Architecture Search (NAS) is known for its high capacity for complex data fitting and we propose to leverage Differentiable NAS to solve S-DGOD. However, it may confront severe over-fitting issues due to the feature imbalance phenomenon, where parameters optimized by gradient descent are biased to learn from the easy-to-learn features, which are usually non-causal and spuriously correlated to ground truth labels, such as the features of background in object detection data. Consequently, this leads to serious performance degradation, especially in generalizing to unseen target domains with huge domain gaps between the source domain and target domains. To address this issue, we propose the Generalizable loss (G-loss), which is an OoD-aware objective, preventing NAS from over-fitting by using gradient descent to optimize parameters not only on a subset of easy-to-learn features but also the remaining predictive features for generalization, and the overall framework is named G-NAS. Experimental results on the S-DGOD urban-scene datasets demonstrate that the proposed G-NAS achieves SOTA performance compared to baseline methods. Codes are available at https://github.com/wufan-cse/G-NAS.
Paper Structure (33 sections, 4 theorems, 24 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 4 theorems, 24 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

(NTRF approximation of DNNs.) When the width of neural networks goes infinite, the output of over-parameterized neural networks can be approximated as a linear function: where $\psi \in \mathbb{R}^{n \times m}$ is the Neural Tangent Random Feature (NTRF) matrix cao2019generalization of $n$ training data, $\Theta \in \mathbb{R}^{m}$ denotes the concatenation of all vectorized trainable parameters

Figures (8)

  • Figure 1: Predictions (category: confidence) of G-NAS on Single Domain Generalization Object Detection datasets. G-NAS is able to detect objects in extremely-challenging environments. Better view in zoom-in mode.
  • Figure 2: The setting of S-DGOD, which aims to learn from a single source domain and generalize to multiple unseen target domains. It requires methods to extract the causal features in the source domain for generalization and the single source domain is easy for networks to over-fit.
  • Figure 3: An overview of the proposed G-NAS. At the beginning of the search stage (t=0), the searchable prediction head super-net is randomly initialized, and the feature set extracted by the detector network $\mathcal{F}(\theta)$ is dominated by several features. At the end of the search stage (t=t), the searchable super-net is converged with chosen operation between each node, and the detector network $\mathcal{F}(\theta)$ is forced by $\mathcal{L}_\textnormal{g}$ to extract features with similar strength, thus, learning from both dominant and subordinate features. At the augment stage, we reconstruct the class prediction head with the searched architectural parameters $\alpha^*$ and retrain the whole network. DRAFT
  • Figure 4: PCA projections of the representations on different domains. The feature representations learned with G-loss (right) have more similar patterns across different domains than without G-loss (left). This demonstrates the efficiency of G-loss as the representations learned on the source domain are generalizable. TODO
  • Figure 5: Searched architectures of the normal cell (left) and reduction cell (right). The searched cell contains four ordered nodes $\{n_1, n_2, n_3, n_4 \}$ and each node has two previous inputs. Each directed edge denotes the chosen operation. The output of the cell is the concatenation of the output of each node.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Theorem 1
  • Proposition 1
  • Theorem 1
  • proof