Table of Contents
Fetching ...

A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks

Saidul Islam, Hanae Elmekki, Ahmed Elsebai, Jamal Bentahar, Najat Drawel, Gaith Rjoub, Witold Pedrycz

TL;DR

This survey compiles and classifies transformer-based deep learning work across five major domains (NLP, computer vision, multi-modality, audio/speech, and signal processing) from 2017 to 2022. It introduces a taxonomy that maps transformer models to concrete tasks within each domain and highlights influential models, datasets, and practical implications. The work identifies gaps, such as cross-domain integration and data-efficient training, and discusses future directions including reinforcement learning integration, medical image/signal processing, and cloud/edge computing. Overall, it serves as a cross-domain reference to accelerate研究 and deployment of transformer architectures across diverse AI tasks.

Abstract

Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. As a result, transformer-based models have attracted substantial interest among researchers in the field of artificial intelligence. This can be attributed to their immense potential and remarkable achievements, not only in Natural Language Processing (NLP) tasks but also in a wide range of domains, including computer vision, audio and speech processing, healthcare, and the Internet of Things (IoT). Although several survey papers have been published highlighting the transformer's contributions in specific fields, architectural differences, or performance evaluations, there is still a significant absence of a comprehensive survey paper encompassing its major applications across various domains. Therefore, we undertook the task of filling this gap by conducting an extensive survey of proposed transformer models from 2017 to 2022. Our survey encompasses the identification of the top five application domains for transformer-based models, namely: NLP, Computer Vision, Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze the impact of highly influential transformer-based models in these domains and subsequently classify them based on their respective tasks using a proposed taxonomy. Our aim is to shed light on the existing potential and future possibilities of transformers for enthusiastic researchers, thus contributing to the broader understanding of this groundbreaking technology.

A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks

TL;DR

This survey compiles and classifies transformer-based deep learning work across five major domains (NLP, computer vision, multi-modality, audio/speech, and signal processing) from 2017 to 2022. It introduces a taxonomy that maps transformer models to concrete tasks within each domain and highlights influential models, datasets, and practical implications. The work identifies gaps, such as cross-domain integration and data-efficient training, and discusses future directions including reinforcement learning integration, medical image/signal processing, and cloud/edge computing. Overall, it serves as a cross-domain reference to accelerate研究 and deployment of transformer architectures across diverse AI tasks.

Abstract

Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. As a result, transformer-based models have attracted substantial interest among researchers in the field of artificial intelligence. This can be attributed to their immense potential and remarkable achievements, not only in Natural Language Processing (NLP) tasks but also in a wide range of domains, including computer vision, audio and speech processing, healthcare, and the Internet of Things (IoT). Although several survey papers have been published highlighting the transformer's contributions in specific fields, architectural differences, or performance evaluations, there is still a significant absence of a comprehensive survey paper encompassing its major applications across various domains. Therefore, we undertook the task of filling this gap by conducting an extensive survey of proposed transformer models from 2017 to 2022. Our survey encompasses the identification of the top five application domains for transformer-based models, namely: NLP, Computer Vision, Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze the impact of highly influential transformer-based models in these domains and subsequently classify them based on their respective tasks using a proposed taxonomy. Our aim is to shed light on the existing potential and future possibilities of transformers for enthusiastic researchers, thus contributing to the broader understanding of this groundbreaking technology.
Paper Structure (51 sections, 2 equations, 5 figures, 1 table)

This paper contains 51 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Multi-head attention & scaled dot product attention r1
  • Figure 2: Transformer architecture r1
  • Figure 3: Methodology of the survey
  • Figure 4: Proportion of transformer application in Top-5 fields
  • Figure 5: Application-based taxonomy of transformer models