Table of Contents
Fetching ...

A Primer on Large Language Models and their Limitations

Sandra Johnson, David Hyland-Wood

TL;DR

The primer surveys large language models (LLMs) from foundational Transformer architectures to training paradigms, outlining how decoder-, encoder-, and encoder-decoder-based models are pre-trained with self-supervised objectives and subsequently adapted via in-context learning or fine-tuning. It emphasizes orchestration with retrieval and domain knowledge, and discusses practical risks such as forgetting, content collapse, jailbreaks, and hallucinations, along with mitigations and governance considerations. The paper highlights evolving trends like open-source vs closed-source dynamics and hardware-software co-design (e.g., LFMs and specialized LPUs), and underscores the need for continual evaluation and responsible deployment. Collectively, it provides a framework for selecting, customizing, integrating, and safely using LLMs across academic and industrial contexts, with actionable guidance on data curation, prompting, and system architecture.

Abstract

This paper provides a primer on Large Language Models (LLMs) and identifies their strengths, limitations, applications and research directions. It is intended to be useful to those in academia and industry who are interested in gaining an understanding of the key LLM concepts and technologies, and in utilising this knowledge in both day to day tasks and in more complex scenarios where this technology can enhance current practices and processes.

A Primer on Large Language Models and their Limitations

TL;DR

The primer surveys large language models (LLMs) from foundational Transformer architectures to training paradigms, outlining how decoder-, encoder-, and encoder-decoder-based models are pre-trained with self-supervised objectives and subsequently adapted via in-context learning or fine-tuning. It emphasizes orchestration with retrieval and domain knowledge, and discusses practical risks such as forgetting, content collapse, jailbreaks, and hallucinations, along with mitigations and governance considerations. The paper highlights evolving trends like open-source vs closed-source dynamics and hardware-software co-design (e.g., LFMs and specialized LPUs), and underscores the need for continual evaluation and responsible deployment. Collectively, it provides a framework for selecting, customizing, integrating, and safely using LLMs across academic and industrial contexts, with actionable guidance on data curation, prompting, and system architecture.

Abstract

This paper provides a primer on Large Language Models (LLMs) and identifies their strengths, limitations, applications and research directions. It is intended to be useful to those in academia and industry who are interested in gaining an understanding of the key LLM concepts and technologies, and in utilising this knowledge in both day to day tasks and in more complex scenarios where this technology can enhance current practices and processes.

Paper Structure

This paper contains 25 sections, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Timeline of LLM releases: blue rounded rectangles are 'pre-trained' models, while orange rectangles are 'instruction-tuned' models. Models above the line indicate open-source availability, and those below the line are closed-source (image from Naveed, H., et al. naveed2024)
  • Figure 2: Generative AI use case categorisation
  • Figure 3: Technical evolution of the OpenAI GPT-series models zhao_survey_2023
  • Figure 4: Task solving zhao_survey_2023
  • Figure 5: (a) Attention framework (image by Vaswani, A., et al. vaswani_attention_2023) (b) Encoder Decoder components of transformer architecture (image by DeepLearning.AI barth_generative_nodate)
  • ...and 14 more figures