CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

Artem Lykov; Valerii Serpiva; Muhammad Haris Khan; Oleg Sautenkov; Artyom Myshlyaev; Grik Tadevosyan; Yasheerah Yaqoot; Dzmitry Tsetserukou

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

Artem Lykov, Valerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Yasheerah Yaqoot, Dzmitry Tsetserukou

TL;DR

CognitiveDrone tackles the absence of open benchmarks for cognitive UAVs by introducing a 7B-parameter Vision-Language-Action model trained on over 8,000 simulated trajectories and paired with a Gazebo-based CognitiveDroneBench that embeds cognitive tasks into a drone-racing track. The authors augment the base VLA with a slower VLM reasoning module (CognitiveDrone-R1) to disambiguate instructions, achieving significantly higher cognitive task success rates ($100\%$-style normalization) across Human Recognition, Symbol Understanding, and Reasoning. Results show RaceVLA excels at low-level flight but struggles with cognition, while CognitiveDrone substantially improves cognition, and CognitiveDrone-R1 delivers the best overall performance (77.2% average), demonstrating the value of explicit reasoning in real-time UAV control. The work provides open-source datasets, a benchmark, model weights, and training/inference code, establishing a new standard for evaluating cognitive capabilities in UAVs and enabling broader research in cognitive robotics.

Abstract

This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

TL;DR

Abstract

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)