Table of Contents
Fetching ...

A Survey of Transformers

Tianyang Lin, Yuxin Wang, Xiangyang Liu, Xipeng Qiu

TL;DR

<3-5 sentence high-level summary>

Abstract

Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.

A Survey of Transformers

TL;DR

<3-5 sentence high-level summary>

Abstract

Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.

Paper Structure

This paper contains 67 sections, 39 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Overview of vanilla Transformer architecture
  • Figure 2: Categorization of Transformer variants.
  • Figure 3: Taxonomy of Transformers
  • Figure 4: Some representative atomic sparse attention patterns. The colored squares means corresponding attention scores are calculated and a blank square means the attention score is discarded.
  • Figure 5: Some representative compound sparse attention patterns. The red boxes indicate sequence boundaries.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Definition 5.1: permutation equivariant function