FOUND: Fourier-based von Mises Distribution for Robust Single Domain Generalization in Object Detection
Mengzhu Wang, Changyuan Deng, Shanshan Wang, Nan Yin, Long Lan, Liang Yang
TL;DR
This work tackles single-domain generalization for object detection by integrating frequency-domain perturbations with hyperspherical feature regularization in a CLIP-guided framework. It introduces Probabilistic Fourier Augmentation (PFA) to diversify appearance while preserving semantic structure, and von Mises-Fisher (vMF) regularization to maintain semantically coherent, compact feature spaces. The method leverages CLIP-based target semantics to guide domain shifts via a semantic shift vector $\Delta q$, and optimizes a combined loss $\mathcal{L}_{total} = \mathcal{L}_{det} + \lambda_{vMF} \mathcal{L}_{vMF}$ to balance robustness and discriminability. Experiments on a challenging adverse-weather driving benchmark show state-of-the-art cross-domain generalization, with notable gains in night/rainy and dusk/rainy conditions, validating the synergy between frequency-domain augmentation and hypersphere-regularized representations for robust SDG in object detection.
Abstract
Single Domain Generalization (SDG) for object detection aims to train a model on a single source domain that can generalize effectively to unseen target domains. While recent methods like CLIP-based semantic augmentation have shown promise, they often overlook the underlying structure of feature distributions and frequency-domain characteristics that are critical for robustness. In this paper, we propose a novel framework that enhances SDG object detection by integrating the von Mises-Fisher (vMF) distribution and Fourier transformation into a CLIP-guided pipeline. Specifically, we model the directional features of object representations using vMF to better capture domain-invariant semantic structures in the embedding space. Additionally, we introduce a Fourier-based augmentation strategy that perturbs amplitude and phase components to simulate domain shifts in the frequency domain, further improving feature robustness. Our method not only preserves the semantic alignment benefits of CLIP but also enriches feature diversity and structural consistency across domains. Extensive experiments on the diverse weather-driving benchmark demonstrate that our approach outperforms the existing state-of-the-art method.
