An Overview on Language Models: Recent Developments and Outlook

Chengwei Wei; Yun-Cheng Wang; Bin Wang; C. -C. Jay Kuo

An Overview on Language Models: Recent Developments and Outlook

Chengwei Wei, Yun-Cheng Wang, Bin Wang, C. -C. Jay Kuo

TL;DR

This survey consolidates the evolution from conventional, auto-regressive LMs to pre-trained language models, framing language modeling as modeling the distributions $P(u_1, u_2, \dots, u_t)$ with autoregressive conditionals such as $P(u_t|u_{<t})$ or their bidirectional counterparts. It surveys five facets—linguistic units, architectures, training paradigms, evaluation, and applications—while detailing types of LMs (Structural, Bidirectional, Permutation), tokenization strategies (characters, subwords, morphemes), core architectures (N-gram, MaxEnt, FFN, RNN, Transformers), and pre-training/fine-tuning pipelines including adapters and prompts. The paper also discusses evaluation practices, decoding and generation techniques, efficiency considerations, and future directions like knowledge-graph integration, incremental learning, and domain-specific lightweight models. Overall, it highlights the trade-offs between model capacity, data efficiency, interpretability, and deployment practicality, underscoring the need for reliable, explainable, and efficient LMs in real-world systems. $P(u_t|u_{<t})$ and related bidirectional probabilities illustrate the mathematical grounding of these models, while the outlined directions point toward hybrid systems that combine structured knowledge with powerful language representations.

Abstract

Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner, while pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications. PLMs have their own training paradigms (usually self-supervised) and serve as foundation models in modern NLP systems. This overview paper provides an introduction to both CLMs and PLMs from five aspects, i.e., linguistic units, architectures, training methods, evaluation methods, and applications. Furthermore, we discuss the relationship between CLMs and PLMs and shed light on the future directions of language modeling in the pre-trained era.

An Overview on Language Models: Recent Developments and Outlook

TL;DR

This survey consolidates the evolution from conventional, auto-regressive LMs to pre-trained language models, framing language modeling as modeling the distributions

with autoregressive conditionals such as

or their bidirectional counterparts. It surveys five facets—linguistic units, architectures, training paradigms, evaluation, and applications—while detailing types of LMs (Structural, Bidirectional, Permutation), tokenization strategies (characters, subwords, morphemes), core architectures (N-gram, MaxEnt, FFN, RNN, Transformers), and pre-training/fine-tuning pipelines including adapters and prompts. The paper also discusses evaluation practices, decoding and generation techniques, efficiency considerations, and future directions like knowledge-graph integration, incremental learning, and domain-specific lightweight models. Overall, it highlights the trade-offs between model capacity, data efficiency, interpretability, and deployment practicality, underscoring the need for reliable, explainable, and efficient LMs in real-world systems.

and related bidirectional probabilities illustrate the mathematical grounding of these models, while the outlined directions point toward hybrid systems that combine structured knowledge with powerful language representations.

Abstract

Paper Structure (44 sections, 16 equations, 11 figures, 2 tables)

This paper contains 44 sections, 16 equations, 11 figures, 2 tables.

Introduction
Types of Language Models
Structural LM
Bidirectional LM
Permutation LM
Linguistic Units
Characters
Words and Subwords
Statistics-based Subword Tokenizers
Linguistics-based Subword Tokenizers
Phrases
Sentences
Architecture of Language Models
N-gram Models
Maximum Entropy Models
...and 29 more sections

Figures (11)

Figure 1: The example of a dependency parse tree example mirowski2015dependency.
Figure 2: The use of different permutations in a natural sentence.
Figure 3: Illustration of the BPE merge operation conducted on the dictionary {"hug", "pug", "pun", "bun"}. The vocabulary is initialized with all characters. Then, a new subword is created by merging the most frequent pair.
Figure 4: The structure of FFN LMs, where $u_{t-N+1},...,u_{t-1}$ denotes the preceding contexts of $u_{t}$ in a fixed-window, and $P$, $H$, and $O$ are the dimensions of the projection, the hidden layer, and the output layer, respectively.
Figure 5: The structure of RNN LMs.
...and 6 more figures

An Overview on Language Models: Recent Developments and Outlook

TL;DR

Abstract

An Overview on Language Models: Recent Developments and Outlook

Authors

TL;DR

Abstract

Table of Contents

Figures (11)