Table of Contents
Fetching ...

CD-FKD: Cross-Domain Feature Knowledge Distillation for Robust Single-Domain Generalization in Object Detection

Junseok Lee, Sungho Shin, Seongju Lee, Kyoobin Lee

Abstract

Single-domain generalization is essential for object detection, particularly when training models on a single source domain and evaluating them on unseen target domains. Domain shifts, such as changes in weather, lighting, or scene conditions, pose significant challenges to the generalization ability of existing models. To address this, we propose Cross-Domain Feature Knowledge Distillation (CD-FKD), which enhances the generalization capability of the student network by leveraging both global and instance-wise feature distillation. The proposed method uses diversified data through downscaling and corruption to train the student network, whereas the teacher network receives the original source domain data. The student network mimics the features of the teacher through both global and instance-wise distillation, enabling it to extract object-centric features effectively, even for objects that are difficult to detect owing to corruption. Extensive experiments on challenging scenes demonstrate that CD-FKD outperforms state-of-the-art methods in both target domain generalization and source domain performance, validating its effectiveness in improving object detection robustness to domain shifts. This approach is valuable in real-world applications, like autonomous driving and surveillance, where robust object detection in diverse environments is crucial.

CD-FKD: Cross-Domain Feature Knowledge Distillation for Robust Single-Domain Generalization in Object Detection

Abstract

Single-domain generalization is essential for object detection, particularly when training models on a single source domain and evaluating them on unseen target domains. Domain shifts, such as changes in weather, lighting, or scene conditions, pose significant challenges to the generalization ability of existing models. To address this, we propose Cross-Domain Feature Knowledge Distillation (CD-FKD), which enhances the generalization capability of the student network by leveraging both global and instance-wise feature distillation. The proposed method uses diversified data through downscaling and corruption to train the student network, whereas the teacher network receives the original source domain data. The student network mimics the features of the teacher through both global and instance-wise distillation, enabling it to extract object-centric features effectively, even for objects that are difficult to detect owing to corruption. Extensive experiments on challenging scenes demonstrate that CD-FKD outperforms state-of-the-art methods in both target domain generalization and source domain performance, validating its effectiveness in improving object detection robustness to domain shifts. This approach is valuable in real-world applications, like autonomous driving and surveillance, where robust object detection in diverse environments is crucial.
Paper Structure (14 sections, 3 equations, 5 figures, 7 tables)

This paper contains 14 sections, 3 equations, 5 figures, 7 tables.

Figures (5)

  • Figure I: Overview of results of our proposed CD-FKD. The top panel qualitatively compares our method with DivAlign danish2024improving and Faster R-CNN ren2016faster, on an Dusk-Rainy. The bottom panel shows a radar chart comparing relative performance.
  • Figure II: Illustration of the proposed single-domain generalized object detection using cross-domain FKD. As a KD framework, the frozen teacher network (represented in pink color) receives source domain data, whereas the student network (represented in blue sky color) is provided with downscaled and corrupted source domain data.
  • Figure III: Examples of corrupted source domain data
  • Figure IV: Qualitative evaluation results of the model's generalization ability on the Night-Clear, Dusk-Rainy, Night-Rainy, and Daytime-Foggy scenes. The top-row images show the results of Faster R-CNN ren2016faster. The middle-row images show the results of DivAlign danish2024improving. The bottom-row images show the results of our method. Red circles and arrows indicate false negatives, while yellow circles and arrows indicate false positives.
  • Figure V: Heatmap visualization for target domain scenes. The left column displays the original images, the middle column presents the results from Faster R-CNN, and the right column shows the results from our method. The heatmaps highlight the areas where the model focuses, with regions of higher attention marked by a redder hue.