Table of Contents
Fetching ...

DS MYOLO: A Reliable Object Detector Based on SSMs for Driving Scenarios

Yang Li, Jianli Xiao

TL;DR

DS MYOLO is proposed, a novel object detector that captures global feature information through a simplified selective scanning fusion block (SimVSS Block) and effectively integrates the network's deep features and introduces an efficient channel attention convolution (ECAConv) that enhances cross-channel feature interaction while maintaining low computational complexity.

Abstract

Accurate real-time object detection enhances the safety of advanced driver-assistance systems, making it an essential component in driving scenarios. With the rapid development of deep learning technology, CNN-based YOLO real-time object detectors have gained significant attention. However, the local focus of CNNs results in performance bottlenecks. To further enhance detector performance, researchers have introduced Transformer-based self-attention mechanisms to leverage global receptive fields, but their quadratic complexity incurs substantial computational costs. Recently, Mamba, with its linear complexity, has made significant progress through global selective scanning. Inspired by Mamba's outstanding performance, we propose a novel object detector: DS MYOLO. This detector captures global feature information through a simplified selective scanning fusion block (SimVSS Block) and effectively integrates the network's deep features. Additionally, we introduce an efficient channel attention convolution (ECAConv) that enhances cross-channel feature interaction while maintaining low computational complexity. Extensive experiments on the CCTSDB 2021 and VLD-45 driving scenarios datasets demonstrate that DS MYOLO exhibits significant potential and competitive advantage among similarly scaled YOLO series real-time object detectors.

DS MYOLO: A Reliable Object Detector Based on SSMs for Driving Scenarios

TL;DR

DS MYOLO is proposed, a novel object detector that captures global feature information through a simplified selective scanning fusion block (SimVSS Block) and effectively integrates the network's deep features and introduces an efficient channel attention convolution (ECAConv) that enhances cross-channel feature interaction while maintaining low computational complexity.

Abstract

Accurate real-time object detection enhances the safety of advanced driver-assistance systems, making it an essential component in driving scenarios. With the rapid development of deep learning technology, CNN-based YOLO real-time object detectors have gained significant attention. However, the local focus of CNNs results in performance bottlenecks. To further enhance detector performance, researchers have introduced Transformer-based self-attention mechanisms to leverage global receptive fields, but their quadratic complexity incurs substantial computational costs. Recently, Mamba, with its linear complexity, has made significant progress through global selective scanning. Inspired by Mamba's outstanding performance, we propose a novel object detector: DS MYOLO. This detector captures global feature information through a simplified selective scanning fusion block (SimVSS Block) and effectively integrates the network's deep features. Additionally, we introduce an efficient channel attention convolution (ECAConv) that enhances cross-channel feature interaction while maintaining low computational complexity. Extensive experiments on the CCTSDB 2021 and VLD-45 driving scenarios datasets demonstrate that DS MYOLO exhibits significant potential and competitive advantage among similarly scaled YOLO series real-time object detectors.
Paper Structure (16 sections, 8 equations, 5 figures, 4 tables)

This paper contains 16 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overall architecture of DS MYOLO.
  • Figure 2: Detailed structure of SimVSS Block ((a) represents component modules of SimVSS Block, (b) represents key internal architecture of VSS module).
  • Figure 3: Key architectures and components of ECAConv and ECACSP ((a) Basic architecture of ECAConv, (b) Detailed structure of ECACSP).
  • Figure 4: Trends in validation metrics for DS MYOLO-N across epochs ((a) results on CCTSDB 2021zhang2022cctsdb, (b) results on VLD-45yang2021vld).
  • Figure 5: CAM visualization results for YOLOv5yolov5v6.1, YOLOv8ultralytics2023, YOLOv10wang2024yolov10, and our DS MYOLO-N on CCTSDB 2021zhang2022cctsdb.