Domain Generalization through Attenuation of Domain-Specific Information
Reiji Saito, Kazuhiro Hotta
TL;DR
This work tackles domain generalization for automotive semantic segmentation by introducing Domain Independence (DI) to quantify where domain-specific information resides and Attenuation of Domain-Specific Information (ADSI) to suppress such cues. DI analyzes feature representations from a frozen encoder and a frequency-space decomposition to identify domain dependence, while ADSI applies a Butterworth-based low-frequency attenuation to both amplitude and phase components, with a color-preserving scalar in [0,1], enabling domain-robust learning from a single domain. Empirically, ADSI outperforms strong baselines like Rein across GTA5-to-Real and Cityscapes-to-ACDC transfers, and ablations show joint attenuation of amplitude and phase with Butterworth masks yields the best gains, though challenges remain under rain/night conditions. The proposed framework provides a principled, frequency-aware approach to domain generalization that leverages single-domain data while maintaining essential color information, with practical impact for robust perception in diverse driving environments.
Abstract
In this paper, we propose a new evaluation metric called Domain Independence (DI) and Attenuation of Domain-Specific Information (ADSI) which is specifically designed for domain-generalized semantic segmentation in automotive images. DI measures the presence of domain-specific information: a lower DI value indicates strong domain dependence, while a higher DI value suggests greater domain independence. This makes it roughly where domain-specific information exists and up to which frequency range it is present. As a result, it becomes possible to effectively suppress only the regions in the image that contain domain-specific information, enabling feature extraction independent of the domain. ADSI uses a Butterworth filter to remove the low-frequency components of images that contain inherent domain-specific information such as sensor characteristics and lighting conditions. However, since low-frequency components also contain important information such as color, we should not remove them completely. Thus, a scalar value (ranging from 0 to 1) is multiplied by the low-frequency components to retain essential information. This helps the model learn more domain-independent features. In experiments, GTA5 (synthetic dataset) was used as training images, and a real-world dataset was used for evaluation, and the proposed method outperformed conventional approaches. Similarly, in experiments that the Cityscapes (real-world dataset) was used for training and various environment datasets such as rain and nighttime were used for evaluation, the proposed method demonstrated its robustness under nighttime conditions.
