Table of Contents
Fetching ...

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs

Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu

TL;DR

A novel architecture called Darkness Clue-Prompted Tracking (DCPT) is proposed that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts and efficiently injects anti-dark knowledge without extra modules.

Abstract

Existing nighttime unmanned aerial vehicle (UAV) trackers follow an "Enhance-then-Track" architecture - first using a light enhancer to brighten the nighttime video, then employing a daytime tracker to locate the object. This separate enhancement and tracking fails to build an end-to-end trainable vision system. To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts. Without a separate enhancer, DCPT directly encodes anti-dark capabilities into prompts using a darkness clue prompter (DCP). Specifically, DCP iteratively learns emphasizing and undermining projections for darkness clues. It then injects these learned visual prompts into a daytime tracker with fixed parameters across transformer layers. Moreover, a gated feature aggregation mechanism enables adaptive fusion between prompts and between prompts and the base model. Extensive experiments show state-of-the-art performance for DCPT on multiple dark scenario benchmarks. The unified end-to-end learning of enhancement and tracking in DCPT enables a more trainable system. The darkness clue prompting efficiently injects anti-dark knowledge without extra modules. Code is available at https://github.com/bearyi26/DCPT.

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs

TL;DR

A novel architecture called Darkness Clue-Prompted Tracking (DCPT) is proposed that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts and efficiently injects anti-dark knowledge without extra modules.

Abstract

Existing nighttime unmanned aerial vehicle (UAV) trackers follow an "Enhance-then-Track" architecture - first using a light enhancer to brighten the nighttime video, then employing a daytime tracker to locate the object. This separate enhancement and tracking fails to build an end-to-end trainable vision system. To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts. Without a separate enhancer, DCPT directly encodes anti-dark capabilities into prompts using a darkness clue prompter (DCP). Specifically, DCP iteratively learns emphasizing and undermining projections for darkness clues. It then injects these learned visual prompts into a daytime tracker with fixed parameters across transformer layers. Moreover, a gated feature aggregation mechanism enables adaptive fusion between prompts and between prompts and the base model. Extensive experiments show state-of-the-art performance for DCPT on multiple dark scenario benchmarks. The unified end-to-end learning of enhancement and tracking in DCPT enables a more trainable system. The darkness clue prompting efficiently injects anti-dark knowledge without extra modules. Code is available at https://github.com/bearyi26/DCPT.
Paper Structure (17 sections, 9 equations, 9 figures, 3 tables)

This paper contains 17 sections, 9 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Illustration of different nighttime UAV tracking paradigms. (a) “Enhance-then-Track" paradigm. (b) Domain adaptation paradigm. (c) Proposed darkness clue-prompted tracking (DCPT) paradigm. DCPT possesses a more streamlined structure while effectively incorporating the learned darkness clue prompts, enabling the UAV to “see” sharper in the dark.
  • Figure 2: Overview architecture of DCPT. The template and search images are first fed into the patch embedding to generate the corresponding tokens. A ViT backbone is employed for fundamental feature extraction and interaction of the concatenated template and search tokens. In parallel, the darkness clue prompter (DCP) blocks $P^i, i\!\in\!\{1,...,N\}$ are distributed in each encoder layer, and they are responsible for extracting valid darkness clue prompts and injecting them into the foundation model. Besides, the gated feature aggregation (GFA) is performed for more effective information fusion.
  • Figure 3: Detailed structure of the proposed DCP module. DCP module takes the foundation features as the input, iteratively emphasizing and undermining the darkness clues, learning the residual term for the reconstruction of valid darkness clue prompts.
  • Figure 4: Overall performance of DCPT and other SOTA trackers on UAVDark135 UAVDark135 (the first row) and NAT2021 nat2021 (the second row) benchmarks.
  • Figure 5: Visualization of tracking in representative nighttime scenarios.
  • ...and 4 more figures