A Primer on Large Language Models and their Limitations
Sandra Johnson, David Hyland-Wood
TL;DR
The primer surveys large language models (LLMs) from foundational Transformer architectures to training paradigms, outlining how decoder-, encoder-, and encoder-decoder-based models are pre-trained with self-supervised objectives and subsequently adapted via in-context learning or fine-tuning. It emphasizes orchestration with retrieval and domain knowledge, and discusses practical risks such as forgetting, content collapse, jailbreaks, and hallucinations, along with mitigations and governance considerations. The paper highlights evolving trends like open-source vs closed-source dynamics and hardware-software co-design (e.g., LFMs and specialized LPUs), and underscores the need for continual evaluation and responsible deployment. Collectively, it provides a framework for selecting, customizing, integrating, and safely using LLMs across academic and industrial contexts, with actionable guidance on data curation, prompting, and system architecture.
Abstract
This paper provides a primer on Large Language Models (LLMs) and identifies their strengths, limitations, applications and research directions. It is intended to be useful to those in academia and industry who are interested in gaining an understanding of the key LLM concepts and technologies, and in utilising this knowledge in both day to day tasks and in more complex scenarios where this technology can enhance current practices and processes.
