Table of Contents
Fetching ...

LEAP:D -- A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection

Chanyeong Park, Heegwang Kim, Joonki Paik

TL;DR

This study contributes to domain-generalized object detection by leveraging learnable prompts and optimizing training processes, which enhances model robustness and adaptability across diverse environments, leading to more effective aerial object detection.

Abstract

Drone-captured images present significant challenges in object detection due to varying shooting conditions, which can alter object appearance and shape. Factors such as drone altitude, angle, and weather cause these variations, influencing the performance of object detection algorithms. To tackle these challenges, we introduce an innovative vision-language approach using learnable prompts. This shift from conventional manual prompts aims to reduce domain-specific knowledge interference, ultimately improving object detection capabilities. Furthermore, we streamline the training process with a one-step approach, updating the learnable prompt concurrently with model training, enhancing efficiency without compromising performance. Our study contributes to domain-generalized object detection by leveraging learnable prompts and optimizing training processes. This enhances model robustness and adaptability across diverse environments, leading to more effective aerial object detection.

LEAP:D -- A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection

TL;DR

This study contributes to domain-generalized object detection by leveraging learnable prompts and optimizing training processes, which enhances model robustness and adaptability across diverse environments, leading to more effective aerial object detection.

Abstract

Drone-captured images present significant challenges in object detection due to varying shooting conditions, which can alter object appearance and shape. Factors such as drone altitude, angle, and weather cause these variations, influencing the performance of object detection algorithms. To tackle these challenges, we introduce an innovative vision-language approach using learnable prompts. This shift from conventional manual prompts aims to reduce domain-specific knowledge interference, ultimately improving object detection capabilities. Furthermore, we streamline the training process with a one-step approach, updating the learnable prompt concurrently with model training, enhancing efficiency without compromising performance. Our study contributes to domain-generalized object detection by leveraging learnable prompts and optimizing training processes. This enhances model robustness and adaptability across diverse environments, leading to more effective aerial object detection.

Paper Structure

This paper contains 12 sections, 11 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Comparison between the traditional two-step training strategy (left) and the one-step training strategy of the proposed LEAP:D (right). In step 1, text embeddings are fine-tuned to align with the training domain through a two-step training process. In contrast, the proposed method (right) uses learnable embeddings, allowing training without specific alignment to the target domain, thus facilitating a streamlined one-step process.
  • Figure 2: The proposed method (right) and the baseline (left) are compared based on their predictions on the VisDrone dataset, as shown in the qualitative results. The yellow boxes highlight zoomed-in areas where the proposed method successfully detects objects that the baseline network misses. This observation demonstrates the superior generalization ability of the proposed method across various shooting conditions.