CA-YOLO: Cross Attention Empowered YOLO for Biomimetic Localization
Zhen Zhang, Qing Zhao, Xiuhe Li, Cheng Wang, Guoqiang Zhu, Yu Zhang, Yining Huo, Hongyi Yu, Yi Zhang
TL;DR
This work tackles the challenge of accurate and robust localization of small, dynamic targets in complex environments by coupling CA-YOLO, a recognition backbone augmented with Multi-Head Self-Attention, a dedicated small-target head, and the CFAM fusion module, with a biomimetic pan-tilt tracking system inspired by the vestibulo-ocular reflex. The CA-YOLO module delivers improved multi-scale detection and small-target performance, while the Bio-Pan-Tilt module provides center-focused, stable tracking through center positioning, stability optimization via a decision boundary, an adaptive control coefficient, and an intelligent recapture strategy. Experimental results on COCO, VisDrone, and custom AGV/UAV datasets show CA-YOLO achieves higher accuracy and remains feasible for real-time deployment, with notable gains in small-target detection and robust tracking under variable speeds. The integrated system demonstrates practical potential for time-sensitive localization tasks in robotics and surveillance, with room for extending to mobile carriers and unknown targets in future work.
Abstract
In modern complex environments, achieving accurate and efficient target localization is essential in numerous fields. However, existing systems often face limitations in both accuracy and the ability to recognize small targets. In this study, we propose a bionic stabilized localization system based on CA-YOLO, designed to enhance both target localization accuracy and small target recognition capabilities. Acting as the "brain" of the system, the target detection algorithm emulates the visual focusing mechanism of animals by integrating bionic modules into the YOLO backbone network. These modules include the introduction of a small target detection head and the development of a Characteristic Fusion Attention Mechanism (CFAM). Furthermore, drawing inspiration from the human Vestibulo-Ocular Reflex (VOR), a bionic pan-tilt tracking control strategy is developed, which incorporates central positioning, stability optimization, adaptive control coefficient adjustment, and an intelligent recapture function. The experimental results show that CA-YOLO outperforms the original model on standard datasets (COCO and VisDrone), with average accuracy metrics improved by 3.94%and 4.90%, respectively.Further time-sensitive target localization experiments validate the effectiveness and practicality of this bionic stabilized localization system.
