Lightweight Full-Convolutional Siamese Tracker
Yunfeng Li, Bo Wang, Xueyi Wu, Zhuoyan Liu, Ye Li
TL;DR
"LightFC presents a lightweight full-convolutional Siamese tracker designed for limited-resource platforms. By introducing an Efficient Cross-Correlation Module (ECM) and an Efficient Rep-Center Head (ERH), it enhances feature representation without the overhead of attention-heavy architectures. Extensive experiments show LightFC and its variant LightFC-vit achieve state-of-the-art performance among lightweight trackers across LaSOT, TrackingNet, TNL2K, and other benchmarks, while maintaining substantially fewer parameters and lower Flops, and running faster on CPUs. The work demonstrates that thoughtful nonlinear fusion, feature reuse, and reparameterization can close the gap with larger models without sacrificing efficiency, making practical, real-time tracking feasible on edge devices.
Abstract
Although single object trackers have achieved advanced performance, their large-scale models hinder their application on limited resources platforms. Moreover, existing lightweight trackers only achieve a balance between 2-3 points in terms of parameters, performance, Flops and FPS. To achieve the optimal balance among these points, this paper proposes a lightweight full-convolutional Siamese tracker called LightFC. LightFC employs a novel efficient cross-correlation module (ECM) and a novel efficient rep-center head (ERH) to improve the feature representation of the convolutional tracking pipeline. The ECM uses an attention-like module design, which conducts spatial and channel linear fusion of fused features and enhances the nonlinearity of the fused features. Additionally, it refers to successful factors of current lightweight trackers and introduces skip-connections and reuse of search area features. The ERH reparameterizes the feature dimensional stage in the standard center-head and introduces channel attention to optimize the bottleneck of key feature flows. Comprehensive experiments show that LightFC achieves the optimal balance between performance, parameters, Flops and FPS. The precision score of LightFC outperforms MixFormerV2-S on LaSOT and TNL2K by 3.7 % and 6.5 %, respectively, while using 5x fewer parameters and 4.6x fewer Flops. Besides, LightFC runs 2x faster than MixFormerV2-S on CPUs. In addition, a higher-performance version named LightFC-vit is proposed by replacing a more powerful backbone network. The code and raw results can be found at https://github.com/LiYunfengLYF/LightFC.
