ABC Align: Large Language Model Alignment for Safety & Accuracy

Gareth Seneque; Lap-Hang Ho; Ariel Kuperman; Nafise Erfanian Saeedi; Jeffrey Molendijk

ABC Align: Large Language Model Alignment for Safety & Accuracy

Gareth Seneque, Lap-Hang Ho, Ariel Kuperman, Nafise Erfanian Saeedi, Jeffrey Molendijk

TL;DR

ABC Align presents a two-setting, organization-aware LLM alignment framework that combines synthetic data, knowledge distillation, and preference optimisation to align both open-source models (via SFT and PO) and frontier models (via ICA). By grounding data in news content and ABC AI Principles, the approach demonstrates substantial improvements on standard benchmarks such as TruthfulQA and BBQ while maintaining reasoning capabilities. The methodology emphasizes efficiency (QLoRA quantisation, small but high-quality datasets) and provider independence, offering a blueprint for large organizations to align LLM outputs with internal policies without full pre-training. The work also highlights the potential of ICA and RAG-grounded data to influence alignment, while acknowledging current limitations and outlining concrete future directions for custom evaluations, multimodality, and broader deployment.

Abstract

Alignment of Large Language Models (LLMs) remains an unsolved problem. Human preferences are highly distributed and can be captured at multiple levels of abstraction, from the individual to diverse populations. Organisational preferences, represented by standards and principles, are defined to mitigate reputational risk or meet legislative obligations. In this paper, we present ABC Align, a novel alignment methodology for LLMs that enables integration of the standards and preferences of a large media organisation into the LLM itself. We combine a set of data and methods that build on recent breakthroughs in synthetic data generation, preference optimisation, and post-training model quantisation. Our unified approach mitigates bias and improves accuracy, while preserving reasoning capability, as measured against standard benchmarks.

ABC Align: Large Language Model Alignment for Safety & Accuracy

TL;DR

Abstract

Paper Structure (51 sections, 4 figures, 4 tables)

This paper contains 51 sections, 4 figures, 4 tables.

Introduction
Overview
Background
ABC Align
Related Work
Domain-Specific Large Language Models
LLM Alignment
Synthetic Data and Knowledge Distillation: ORCA
Preference Optimisation: ORPO
In-Context Alignment
Model Quantisation
Retrieval Augmented Generation & Grounding Synthetic Data
Information Theoretic Measures in NLP
Methodology
Synthetic Data Generation
...and 36 more sections

Figures (4)

Figure 1: Comparison of ABC Align SFT dataset against other methods, across arc-challenge (left), bbq-lite-json (middle) and truthfulqa_mc2 (right).
Figure 2: Comparison of ABC Align ORPO dataset against control datasets, across arc-challenge (left), bbq-lite-json (middle) and truthfulqa_mc2 (right).
Figure 3: Comparison between IFT and ABC Align models fine-tuned using SFT or ORPO, across arc-challenge (left), bbq-lite-json (middle) and truthfulqa_mc2 (right).
Figure 4: Evaluation of ICL Alignment using augmented prompts drawn from our internal RAG tool and the ABC AI Principles. Evaluation is conducted across bbq-lite and BLEU and ROUGE scores drawn from truthfulqa_mc2.

ABC Align: Large Language Model Alignment for Safety & Accuracy

TL;DR

Abstract

ABC Align: Large Language Model Alignment for Safety & Accuracy

Authors

TL;DR

Abstract

Table of Contents

Figures (4)