CLIP-Guided Source-Free Object Detection in Aerial Images
Nanqing Liu, Xun Xu, Yongyi Su, Chengxin Liu, Peiliang Gong, Heng-Chao Li
TL;DR
The paper tackles cross‑domain aerial object detection under restricted access to source data by proposing a Source‑Free Object Detection (SFOD) framework that blends a teacher–student self‑training pipeline with CLIP‑guided Aggregation (CGA). It refines pseudo‑labels through CGA, which uses rotated-to-horizontal box transformations and CLIP‑based zero‑shot scoring, with a balancing factor (λ) and an EMA‑updated teacher (α = 0.998). Key contributions include applying SFOD to oriented aerial objects, integrating CLIP into the learning loop to mitigate confirmation bias, and creating two domain datasets, DIOR‑C and DIOR‑Cloudy, for evaluation. Results show improvements over baselines on the new datasets, though CGA provides limited gains due to a mismatch between corrupted aerial imagery and CLIP’s domain, suggesting future work to decouple text and image branches for task‑ and corruption‑specific prompting.
Abstract
Domain adaptation is crucial in aerial imagery, as the visual representation of these images can significantly vary based on factors such as geographic location, time, and weather conditions. Additionally, high-resolution aerial images often require substantial storage space and may not be readily accessible to the public. To address these challenges, we propose a novel Source-Free Object Detection (SFOD) method. Specifically, our approach begins with a self-training framework, which significantly enhances the performance of baseline methods. To alleviate the noisy labels in self-training, we utilize Contrastive Language-Image Pre-training (CLIP) to guide the generation of pseudo-labels, termed CLIP-guided Aggregation (CGA). By leveraging CLIP's zero-shot classification capability, we aggregate its scores with the original predicted bounding boxes, enabling us to obtain refined scores for the pseudo-labels. To validate the effectiveness of our method, we constructed two new datasets from different domains based on the DIOR dataset, named DIOR-C and DIOR-Cloudy. Experimental results demonstrate that our method outperforms other comparative algorithms. The code is available at https://github.com/Lans1ng/SFOD-RS.
