Table of Contents
Fetching ...

CSPNet: A New Backbone that can Enhance Learning Capability of CNN

Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh

TL;DR

CSPNet introduces a Cross Stage Partial Network that splits feature maps to reduce duplicate gradient information, boosting gradient diversity and cutting computations by up to ~20% without sacrificing accuracy. The Exact Fusion Model (EFM) further improves multi-scale feature fusion and reduces memory bandwidth via Maxout-based compression, enabling efficient one-stage detectors. Across ResNet, ResNeXt, and DenseNet backbones, CSPNet achieves substantial FLOPs reductions with maintained or improved ImageNet accuracy and superior COCO AP50 performance, including real-time capabilities on GPUs, CPUs, and edge devices. The work emphasizes hardware utilization and memory efficiency, making advanced CNN architectures more accessible for resource-constrained environments.

Abstract

Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. We attribute the problem to the duplicate gradient information within network optimization. The proposed networks respect the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which, in our experiments, reduces computations by 20% with equivalent or even superior accuracy on the ImageNet dataset, and significantly outperforms state-of-the-art approaches in terms of AP50 on the MS COCO object detection dataset. The CSPNet is easy to implement and general enough to cope with architectures based on ResNet, ResNeXt, and DenseNet. Source code is at https://github.com/WongKinYiu/CrossStagePartialNetworks.

CSPNet: A New Backbone that can Enhance Learning Capability of CNN

TL;DR

CSPNet introduces a Cross Stage Partial Network that splits feature maps to reduce duplicate gradient information, boosting gradient diversity and cutting computations by up to ~20% without sacrificing accuracy. The Exact Fusion Model (EFM) further improves multi-scale feature fusion and reduces memory bandwidth via Maxout-based compression, enabling efficient one-stage detectors. Across ResNet, ResNeXt, and DenseNet backbones, CSPNet achieves substantial FLOPs reductions with maintained or improved ImageNet accuracy and superior COCO AP50 performance, including real-time capabilities on GPUs, CPUs, and edge devices. The work emphasizes hardware utilization and memory efficiency, making advanced CNN architectures more accessible for resource-constrained environments.

Abstract

Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. We attribute the problem to the duplicate gradient information within network optimization. The proposed networks respect the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which, in our experiments, reduces computations by 20% with equivalent or even superior accuracy on the ImageNet dataset, and significantly outperforms state-of-the-art approaches in terms of AP50 on the MS COCO object detection dataset. The CSPNet is easy to implement and general enough to cope with architectures based on ResNet, ResNeXt, and DenseNet. Source code is at https://github.com/WongKinYiu/CrossStagePartialNetworks.

Paper Structure

This paper contains 12 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Proposed CSPNet can be applied on ResNet he2016deep, ResNeXt xie2017aggregated, DenseNet huang2017densely, etc. It not only reduce computation cost and memory usage of these networks, but also benefit on inference speed and accuracy.
  • Figure 2: Illustrations of (a) DenseNet and (b) our proposed Cross Stage Partial DenseNet (CSPDenseNet). CSPNet separates feature map of the base layer into two part, one part will go through a dense block and a transition layer; the other one part is then combined with transmitted feature map to the next stage.
  • Figure 3: Different kind of feature fusion strategies. (a) single path DenseNet, (b) proposed CSPDenseNet: transition $\rightarrow$ concatenation $\rightarrow$ transition, (c) concatenation $\rightarrow$ transition, and (d) transition $\rightarrow$ concatenation.
  • Figure 4: Effect of truncating gradient flow for maximizing difference of gradient combination.
  • Figure 5: Applying CSPNet to ResNe(X)t.
  • ...and 3 more figures