Table of Contents
Fetching ...

YOLO-Ant: A Lightweight Detector via Depthwise Separable Convolutional and Large Kernel Design for Antenna Interference Source Detection

Xiaoyu Tang, Xingming Chen, Jintao Cheng, Jin Wu, Rui Fan, Chengxi Zhang, Zebo Zhou

TL;DR

This work tackles the practical problem of detecting antenna interference sources using UAVs, introducing YOLO-Ant, a lightweight CNN–transformer detector built on depthwise separable convolutions and large kernels to enhance small-object recognition in cluttered aerial scenes. The key innovations are the DSLK-Block, which expands receptive fields efficiently, and the DSLKVit-Block, which injects a transformer-based global context into the neck while keeping the model lightweight. Through pruning-inspired backbone/neck redesign and targeted use of transformer modules, YOLO-Ant achieves strong accuracy on antenna datasets and competitive results on COCO and VisDrone, with superior small-object performance and favorable parameter counts and GFLOPs. The approach demonstrates practical potential for real-time UAV inspections and lays groundwork for broader integration with spectrum analysis tools in intelligent aerial inspection systems.

Abstract

In the era of 5G communication, removing interference sources that affect communication is a resource-intensive task. The rapid development of computer vision has enabled unmanned aerial vehicles to perform various high-altitude detection tasks. Because the field of object detection for antenna interference sources has not been fully explored, this industry lacks dedicated learning samples and detection models for this specific task. In this article, an antenna dataset is created to address important antenna interference source detection issues and serves as the basis for subsequent research. We introduce YOLO-Ant, a lightweight CNN and transformer hybrid detector specifically designed for antenna interference source detection. Specifically, we initially formulated a lightweight design for the network depth and width, ensuring that subsequent investigations were conducted within a lightweight framework. Then, we propose a DSLK-Block module based on depthwise separable convolution and large convolution kernels to enhance the network's feature extraction ability, effectively improving small object detection. To address challenges such as complex backgrounds and large interclass differences in antenna detection, we construct DSLKVit-Block, a powerful feature extraction module that combines DSLK-Block and transformer structures. Considering both its lightweight design and accuracy, our method not only achieves optimal performance on the antenna dataset but also yields competitive results on public datasets.

YOLO-Ant: A Lightweight Detector via Depthwise Separable Convolutional and Large Kernel Design for Antenna Interference Source Detection

TL;DR

This work tackles the practical problem of detecting antenna interference sources using UAVs, introducing YOLO-Ant, a lightweight CNN–transformer detector built on depthwise separable convolutions and large kernels to enhance small-object recognition in cluttered aerial scenes. The key innovations are the DSLK-Block, which expands receptive fields efficiently, and the DSLKVit-Block, which injects a transformer-based global context into the neck while keeping the model lightweight. Through pruning-inspired backbone/neck redesign and targeted use of transformer modules, YOLO-Ant achieves strong accuracy on antenna datasets and competitive results on COCO and VisDrone, with superior small-object performance and favorable parameter counts and GFLOPs. The approach demonstrates practical potential for real-time UAV inspections and lays groundwork for broader integration with spectrum analysis tools in intelligent aerial inspection systems.

Abstract

In the era of 5G communication, removing interference sources that affect communication is a resource-intensive task. The rapid development of computer vision has enabled unmanned aerial vehicles to perform various high-altitude detection tasks. Because the field of object detection for antenna interference sources has not been fully explored, this industry lacks dedicated learning samples and detection models for this specific task. In this article, an antenna dataset is created to address important antenna interference source detection issues and serves as the basis for subsequent research. We introduce YOLO-Ant, a lightweight CNN and transformer hybrid detector specifically designed for antenna interference source detection. Specifically, we initially formulated a lightweight design for the network depth and width, ensuring that subsequent investigations were conducted within a lightweight framework. Then, we propose a DSLK-Block module based on depthwise separable convolution and large convolution kernels to enhance the network's feature extraction ability, effectively improving small object detection. To address challenges such as complex backgrounds and large interclass differences in antenna detection, we construct DSLKVit-Block, a powerful feature extraction module that combines DSLK-Block and transformer structures. Considering both its lightweight design and accuracy, our method not only achieves optimal performance on the antenna dataset but also yields competitive results on public datasets.
Paper Structure (14 sections, 15 equations, 8 figures, 8 tables)

This paper contains 14 sections, 15 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: The process of 5G communication in the CBN-U-H5H-0713 area is shown in the figure. Two antenna interference source signals appear in it. The gNB (gNodeB) denotes a 5G base station. The UE (User Equipment) denotes the terminal equipment that users use to access the wireless network.
  • Figure 2: The YOLO-Ant model can be roughly divided into three parts: the backbone consisting of DSLKNet, the FPN+PAN structure consisting of DSLK-Layer and DSLKVit-Block forming the neck.
  • Figure 3: DSLK-Block & DSLK-Layer
  • Figure 4: DSLKVit-Block
  • Figure 6: (a) shows an example of an input antenna image and detection result, (b) compares the performance results of the baseline model after channel pruning on the neck, and (c) shows the visualization results of the P4 feature layer of the baseline model were presented. These visualizations unveiled the presence of a considerable degree of information redundancy among the 256 channels. This redundancy was notably characterized by a pronounced repetition of feature maps. To illustrate this phenomenon, we have highlighted two specific groups as exemplary instances.
  • ...and 3 more figures