Table of Contents
Fetching ...

PP-LCNet: A Lightweight CPU Convolutional Neural Network

Cheng Cui, Tingquan Gao, Shengyu Wei, Yuning Du, Ruoyu Guo, Shuilong Dong, Bin Lu, Ying Zhou, Xueying Lv, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma

TL;DR

The paper tackles the challenge of achieving high accuracy with lightweight CNNs on CPU hardware using MKLDNN. It proposes PP-LCNet, a depthwise-separable convolution-based architecture enhanced by four targeted techniques: a better activation (H-Swish), SE modules placed toward the tail, occasional use of larger 5x5 kernels, and a 1280-channel 1x1 convolution after GAP, all implemented in PaddlePaddle. Through comprehensive experiments on ImageNet, COCO, and Cityscapes, PP-LCNet demonstrates competitive or superior accuracy at low latency compared to other light models, with ablations validating the contributions. The work also outlines practical design rules to guide CPU-focused CNN development and suggests directions for NAS to accelerate discovery of even faster models on CPU platforms.

Abstract

We propose a lightweight CPU network based on the MKLDNN acceleration strategy, named PP-LCNet, which improves the performance of lightweight models on multiple tasks. This paper lists technologies which can improve network accuracy while the latency is almost constant. With these improvements, the accuracy of PP-LCNet can greatly surpass the previous network structure with the same inference time for classification. As shown in Figure 1, it outperforms the most state-of-the-art models. And for downstream tasks of computer vision, it also performs very well, such as object detection, semantic segmentation, etc. All our experiments are implemented based on PaddlePaddle. Code and pretrained models are available at PaddleClas.

PP-LCNet: A Lightweight CPU Convolutional Neural Network

TL;DR

The paper tackles the challenge of achieving high accuracy with lightweight CNNs on CPU hardware using MKLDNN. It proposes PP-LCNet, a depthwise-separable convolution-based architecture enhanced by four targeted techniques: a better activation (H-Swish), SE modules placed toward the tail, occasional use of larger 5x5 kernels, and a 1280-channel 1x1 convolution after GAP, all implemented in PaddlePaddle. Through comprehensive experiments on ImageNet, COCO, and Cityscapes, PP-LCNet demonstrates competitive or superior accuracy at low latency compared to other light models, with ablations validating the contributions. The work also outlines practical design rules to guide CPU-focused CNN development and suggests directions for NAS to accelerate discovery of even faster models on CPU platforms.

Abstract

We propose a lightweight CPU network based on the MKLDNN acceleration strategy, named PP-LCNet, which improves the performance of lightweight models on multiple tasks. This paper lists technologies which can improve network accuracy while the latency is almost constant. With these improvements, the accuracy of PP-LCNet can greatly surpass the previous network structure with the same inference time for classification. As shown in Figure 1, it outperforms the most state-of-the-art models. And for downstream tasks of computer vision, it also performs very well, such as object detection, semantic segmentation, etc. All our experiments are implemented based on PaddlePaddle. Code and pretrained models are available at PaddleClas.

Paper Structure

This paper contains 14 sections, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Comparing the accuracy-latency of different mobile series models. Latency tested on Intel$^\circledR$ Xeon$^\circledR$ Gold 6148 Processor with batch size of 1 and MKLDNN enabled, the number of thread is 10.
  • Figure 2: A detailed view of PP-LCNet. The dotted box represents optional modules.The stem part uses standard $3 \times 3$ convolution. DepthSepConv means depth-wise separable convolutions, DW means depth-wise convolution, PW means point-wise convolution, GAP means Global Average Pooling.