Table of Contents
Fetching ...

A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends

Abolfazl Younesi, Mohsen Ansari, MohammadAmin Fazli, Alireza Ejlali, Muhammad Shafique, Jörg Henkel

TL;DR

This survey analyzes the breadth of convolutional techniques in deep learning, with a focus on 1D/2D/3D convolutions, dilated/grouped architectures, and advanced variants like depthwise separable convolutions, attention-enabled convolutions, NAS, GANs, and vision transformers. It maps architectural innovations to concrete applications (image recognition, object detection, segmentation, NLP, medical imaging) and discusses performance, efficiency, and deployment on edge devices, including compression and pruning. A taxonomy of CNN architectures is proposed, alongside a synthesis of frameworks, datasets, and research trends, highlighting interpretability, robustness, and multimodal integration as key future directions. The paper emphasizes practical impact through guidelines for benchmarking, hardware-aware design, and the integration of uncertainty estimation and domain knowledge to enable reliable, scalable vision systems in real-world settings.

Abstract

In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning (DL), are widely used for various computer vision tasks such as image classification, object detection, and image segmentation. There are numerous types of CNNs designed to meet specific needs and requirements, including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention, depthwise convolutions, and NAS, among others. Each type of CNN has its unique structure and characteristics, making it suitable for specific tasks. It's crucial to gain a thorough understanding and perform a comparative analysis of these different CNN types to understand their strengths and weaknesses. Furthermore, studying the performance, limitations, and practical applications of each type of CNN can aid in the development of new and improved architectures in the future. We also dive into the platforms and frameworks that researchers utilize for their research or development from various perspectives. Additionally, we explore the main research fields of CNN like 6D vision, generative models, and meta-learning. This survey paper provides a comprehensive examination and comparison of various CNN architectures, highlighting their architectural differences and emphasizing their respective advantages, disadvantages, applications, challenges, and future trends.

A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends

TL;DR

This survey analyzes the breadth of convolutional techniques in deep learning, with a focus on 1D/2D/3D convolutions, dilated/grouped architectures, and advanced variants like depthwise separable convolutions, attention-enabled convolutions, NAS, GANs, and vision transformers. It maps architectural innovations to concrete applications (image recognition, object detection, segmentation, NLP, medical imaging) and discusses performance, efficiency, and deployment on edge devices, including compression and pruning. A taxonomy of CNN architectures is proposed, alongside a synthesis of frameworks, datasets, and research trends, highlighting interpretability, robustness, and multimodal integration as key future directions. The paper emphasizes practical impact through guidelines for benchmarking, hardware-aware design, and the integration of uncertainty estimation and domain knowledge to enable reliable, scalable vision systems in real-world settings.

Abstract

In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning (DL), are widely used for various computer vision tasks such as image classification, object detection, and image segmentation. There are numerous types of CNNs designed to meet specific needs and requirements, including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention, depthwise convolutions, and NAS, among others. Each type of CNN has its unique structure and characteristics, making it suitable for specific tasks. It's crucial to gain a thorough understanding and perform a comparative analysis of these different CNN types to understand their strengths and weaknesses. Furthermore, studying the performance, limitations, and practical applications of each type of CNN can aid in the development of new and improved architectures in the future. We also dive into the platforms and frameworks that researchers utilize for their research or development from various perspectives. Additionally, we explore the main research fields of CNN like 6D vision, generative models, and meta-learning. This survey paper provides a comprehensive examination and comparison of various CNN architectures, highlighting their architectural differences and emphasizing their respective advantages, disadvantages, applications, challenges, and future trends.
Paper Structure (80 sections, 1 equation, 20 figures, 10 tables)

This paper contains 80 sections, 1 equation, 20 figures, 10 tables.

Figures (20)

  • Figure 1: Represents the section-by-section structure of the paper that provides a clear and organized framework for presenting the research findings.
  • Figure 2: A text-based visual reading map that helps individuals navigate and comprehend the paper
  • Figure 3: . A graphical representation of CNN architectures from 1998 to 2023
  • Figure 4: The flow of CNN architectures from 1998-2020 with their pros and cons represents that each CNN model is efficient for a specific application
  • Figure 5: A graphical representation of Section 3
  • ...and 15 more figures