Table of Contents
Fetching ...

Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches

Omar Elharrouss, Younes Akbari, Noor Almaadeed, Somaya Al-Maadeed

TL;DR

This survey catalogs major backbone networks (e.g., VGG, ResNet, DenseNet, Inception, EfficientNet, HRNet) and maps their usage across core CV tasks including image classification, object detection, crowd counting, video summarization, action and face recognition, COVID-19 detection, and panoptic segmentation. It compares performance and efficiency through representative metrics on standard datasets (ImageNet, MS COCO, Cityscapes, SumMe/TVSum, LFW/YTF, etc.), highlighting how backbone choice drives accuracy–compute trade-offs and task-specific gains. The authors discuss evaluation trends, limitations, and practical considerations for backbone selection, and offer future directions such as data augmentation and DRL-based data labeling to address data and annotation bottlenecks. Overall, the review provides a consolidated reference of backbones, their task associations, and key performance patterns to guide researchers and developers in CV and DRL applications.

Abstract

To understand the real world using various types of data, Artificial Intelligence (AI) is the most used technique nowadays. While finding the pattern within the analyzed data represents the main task. This is performed by extracting representative features step, which is proceeded using the statistical algorithms or using some specific filters. However, the selection of useful features from large-scale data represented a crucial challenge. Now, with the development of convolution neural networks (CNNs), the feature extraction operation has become more automatic and easier. CNNs allow to work on large-scale size of data, as well as cover different scenarios for a specific task. For computer vision tasks, convolutional networks are used to extract features also for the other parts of a deep learning model. The selection of a suitable network for feature extraction or the other parts of a DL model is not random work. So, the implementation of such a model can be related to the target task as well as the computational complexity of it. Many networks have been proposed and become the famous networks used for any DL models in any AI task. These networks are exploited for feature extraction or at the beginning of any DL model which is named backbones. A backbone is a known network trained in many other tasks before and demonstrates its effectiveness. In this paper, an overview of the existing backbones, e.g. VGGs, ResNets, DenseNet, etc, is given with a detailed description. Also, a couple of computer vision tasks are discussed by providing a review of each task regarding the backbones used. In addition, a comparison in terms of performance is also provided, based on the backbone used for each task.

Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches

TL;DR

This survey catalogs major backbone networks (e.g., VGG, ResNet, DenseNet, Inception, EfficientNet, HRNet) and maps their usage across core CV tasks including image classification, object detection, crowd counting, video summarization, action and face recognition, COVID-19 detection, and panoptic segmentation. It compares performance and efficiency through representative metrics on standard datasets (ImageNet, MS COCO, Cityscapes, SumMe/TVSum, LFW/YTF, etc.), highlighting how backbone choice drives accuracy–compute trade-offs and task-specific gains. The authors discuss evaluation trends, limitations, and practical considerations for backbone selection, and offer future directions such as data augmentation and DRL-based data labeling to address data and annotation bottlenecks. Overall, the review provides a consolidated reference of backbones, their task associations, and key performance patterns to guide researchers and developers in CV and DRL applications.

Abstract

To understand the real world using various types of data, Artificial Intelligence (AI) is the most used technique nowadays. While finding the pattern within the analyzed data represents the main task. This is performed by extracting representative features step, which is proceeded using the statistical algorithms or using some specific filters. However, the selection of useful features from large-scale data represented a crucial challenge. Now, with the development of convolution neural networks (CNNs), the feature extraction operation has become more automatic and easier. CNNs allow to work on large-scale size of data, as well as cover different scenarios for a specific task. For computer vision tasks, convolutional networks are used to extract features also for the other parts of a deep learning model. The selection of a suitable network for feature extraction or the other parts of a DL model is not random work. So, the implementation of such a model can be related to the target task as well as the computational complexity of it. Many networks have been proposed and become the famous networks used for any DL models in any AI task. These networks are exploited for feature extraction or at the beginning of any DL model which is named backbones. A backbone is a known network trained in many other tasks before and demonstrates its effectiveness. In this paper, an overview of the existing backbones, e.g. VGGs, ResNets, DenseNet, etc, is given with a detailed description. Also, a couple of computer vision tasks are discussed by providing a review of each task regarding the backbones used. In addition, a comparison in terms of performance is also provided, based on the backbone used for each task.
Paper Structure (49 sections, 9 figures, 10 tables)

This paper contains 49 sections, 9 figures, 10 tables.

Figures (9)

  • Figure 1: VGG and ResNet architectures.
  • Figure 2: GoogleNet and DenseNet architectures
  • Figure 3: DetNet and ShuffleNet architectures
  • Figure 4: MobileNet-V2 and SqueezeNet architectures.
  • Figure 5: WideResNet and EfficientNet architectures
  • ...and 4 more figures