The advantages of context specific language models: the case of the Erasmian Language Model
João Gonçalves, Nick Jelicic, Michele Murgia, Evert Stamhuis
TL;DR
The paper addresses the sustainability and governance issues of scaling large language models by proposing a context-specific alternative: the Erasmian Language Model (ELM), a 900M-parameter LLaMA 2-based system trained on Erasmus University Rotterdam data. Through domain-focused pretraining, instruction tuning, and RLHF with direct preference optimization, ELM demonstrates specialization in Erasmus-relevant social sciences, humanities, and medicine tasks while operating on limited hardware and with strong privacy controls. Evaluations include qualitative classroom use, a domain-tailored MMLU assessment, and the COMPASS trustworthiness framework, showing useful performance within its context and highlighting the benefits and trade-offs of context-specific approaches. The study argues that such context-bound models offer practical, governance-aligned alternatives for resource-constrained institutions, providing a replicable methodology for domain-focused AI while emphasizing iterative development and stakeholder engagement.
Abstract
The current trend to improve language model performance seems to be based on scaling up with the number of parameters (e.g. the state of the art GPT4 model has approximately 1.7 trillion parameters) or the amount of training data fed into the model. However this comes at significant costs in terms of computational resources and energy costs that compromise the sustainability of AI solutions, as well as risk relating to privacy and misuse. In this paper we present the Erasmian Language Model (ELM) a small context specific, 900 million parameter model, pre-trained and fine-tuned by and for Erasmus University Rotterdam. We show how the model performs adequately in a classroom context for essay writing, and how it achieves superior performance in subjects that are part of its context. This has implications for a wide range of institutions and organizations, showing that context specific language models may be a viable alternative for resource constrained, privacy sensitive use cases.
