3D UAV Trajectory Estimation and Classification from Internet Videos via Language Model

Haoxiang Lei; Daotong Wang; Shenghai Yuan; Jianbo Su

3D UAV Trajectory Estimation and Classification from Internet Videos via Language Model

Haoxiang Lei, Daotong Wang, Shenghai Yuan, Jianbo Su

TL;DR

This work presents a novel framework that derives UAV 3D trajectories and category information directly from Internet-scale UAV videos, without relying on manual annotations, highlighting its robustness and applicability to real-world anti-UAV scenarios.

Abstract

Reliable 3D trajectory estimation of unmanned aerial vehicles (UAVs) is a fundamental requirement for anti-UAV systems, yet the acquisition of large-scale and accurately annotated trajectory data remains prohibitively expensive. In this work, we present a novel framework that derives UAV 3D trajectories and category information directly from Internet-scale UAV videos, without relying on manual annotations. First, language-driven data acquisition is employed to autonomously discover and collect UAV-related videos, while vision-language reasoning progressively filters task-relevant segments. Second, a training-free cross-modal label generation module is introduced to infer 3D trajectory hypotheses and UAV type cues. Third, a physics-informed refinement process is designed to impose temporal smoothness and kinematic consistency on the estimated trajectories. The resulting video clips and trajectory annotations can be readily utilized for downstream anti-UAV tasks. To assess effectiveness and generalization, we conduct zero-shot transfer experiments on a public, well-annotated 3D UAV benchmark. Results reveal a clear data scaling behavior: as the amount of online video data increases, zero-shot transfer performance on the target dataset improves consistently, without any target-domain training. The proposed method closely approaches the current state-of-the-art, highlighting its robustness and applicability to real-world anti-UAV scenarios. Code and datasets will be released upon acceptance.

3D UAV Trajectory Estimation and Classification from Internet Videos via Language Model

TL;DR

Abstract

Paper Structure (16 sections, 8 equations, 6 figures, 3 tables)

This paper contains 16 sections, 8 equations, 6 figures, 3 tables.

Introduction
Related works
Annotated Datasets for Anti-UAV Research
Dataset-Driven Trajectory Estimation Approaches
Proposed Framework
Language-driven Data Acquisition
Training-free Cross-modal Label Generation
Physics-informed Refinement
Experiments and Performance
Implementation Details
Zero-shot Transfer on the well-annotated dataset
Performance on the MMAUD dataset
Generalization and Ablation Study
Data Scaling behavior
Parameter Sensitivity Analysis
...and 1 more sections

Figures (6)

Figure 1: The comparison between the acquisition of existing Anti-UAV datasets, and our proposed methods for anti-UAV datasets acquisition.
Figure 2: The overall framework consists of three main parts, the one on the top is the Language-driven Data Acquisition module for high-quality anti-UAV data. The others on the bottom are training-free cross-modal label generation for 3D UAV trajectory and classification, and the physics-informed refinement. The final output is the 3D trajectory of the UAV, and the LiDAR point cloud is used for visualization only, not for processing.
Figure 3: The red bounding boxes represent the output of the benchmark solution from Anti-UAV anti-UAV_dataset, blue boxes represent the Grounded SAM output groundedsam, and the yellow boxes represent the lightweight UAV detector output drone_yolov8.
Figure 4: Visualization of the comparison results of the DJI Pham4 sequence in MMAUD.
Figure 5: Scaling behavior of zero-shot transfer performance.
...and 1 more figures

3D UAV Trajectory Estimation and Classification from Internet Videos via Language Model

TL;DR

Abstract

3D UAV Trajectory Estimation and Classification from Internet Videos via Language Model

Authors

TL;DR

Abstract

Table of Contents

Figures (6)