Table of Contents
Fetching ...

ABC Align: Large Language Model Alignment for Safety & Accuracy

Gareth Seneque, Lap-Hang Ho, Ariel Kuperman, Nafise Erfanian Saeedi, Jeffrey Molendijk

TL;DR

ABC Align presents a two-setting, organization-aware LLM alignment framework that combines synthetic data, knowledge distillation, and preference optimisation to align both open-source models (via SFT and PO) and frontier models (via ICA). By grounding data in news content and ABC AI Principles, the approach demonstrates substantial improvements on standard benchmarks such as TruthfulQA and BBQ while maintaining reasoning capabilities. The methodology emphasizes efficiency (QLoRA quantisation, small but high-quality datasets) and provider independence, offering a blueprint for large organizations to align LLM outputs with internal policies without full pre-training. The work also highlights the potential of ICA and RAG-grounded data to influence alignment, while acknowledging current limitations and outlining concrete future directions for custom evaluations, multimodality, and broader deployment.

Abstract

Alignment of Large Language Models (LLMs) remains an unsolved problem. Human preferences are highly distributed and can be captured at multiple levels of abstraction, from the individual to diverse populations. Organisational preferences, represented by standards and principles, are defined to mitigate reputational risk or meet legislative obligations. In this paper, we present ABC Align, a novel alignment methodology for LLMs that enables integration of the standards and preferences of a large media organisation into the LLM itself. We combine a set of data and methods that build on recent breakthroughs in synthetic data generation, preference optimisation, and post-training model quantisation. Our unified approach mitigates bias and improves accuracy, while preserving reasoning capability, as measured against standard benchmarks.

ABC Align: Large Language Model Alignment for Safety & Accuracy

TL;DR

ABC Align presents a two-setting, organization-aware LLM alignment framework that combines synthetic data, knowledge distillation, and preference optimisation to align both open-source models (via SFT and PO) and frontier models (via ICA). By grounding data in news content and ABC AI Principles, the approach demonstrates substantial improvements on standard benchmarks such as TruthfulQA and BBQ while maintaining reasoning capabilities. The methodology emphasizes efficiency (QLoRA quantisation, small but high-quality datasets) and provider independence, offering a blueprint for large organizations to align LLM outputs with internal policies without full pre-training. The work also highlights the potential of ICA and RAG-grounded data to influence alignment, while acknowledging current limitations and outlining concrete future directions for custom evaluations, multimodality, and broader deployment.

Abstract

Alignment of Large Language Models (LLMs) remains an unsolved problem. Human preferences are highly distributed and can be captured at multiple levels of abstraction, from the individual to diverse populations. Organisational preferences, represented by standards and principles, are defined to mitigate reputational risk or meet legislative obligations. In this paper, we present ABC Align, a novel alignment methodology for LLMs that enables integration of the standards and preferences of a large media organisation into the LLM itself. We combine a set of data and methods that build on recent breakthroughs in synthetic data generation, preference optimisation, and post-training model quantisation. Our unified approach mitigates bias and improves accuracy, while preserving reasoning capability, as measured against standard benchmarks.
Paper Structure (51 sections, 4 figures, 4 tables)

This paper contains 51 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparison of ABC Align SFT dataset against other methods, across arc-challenge (left), bbq-lite-json (middle) and truthfulqa_mc2 (right).
  • Figure 2: Comparison of ABC Align ORPO dataset against control datasets, across arc-challenge (left), bbq-lite-json (middle) and truthfulqa_mc2 (right).
  • Figure 3: Comparison between IFT and ABC Align models fine-tuned using SFT or ORPO, across arc-challenge (left), bbq-lite-json (middle) and truthfulqa_mc2 (right).
  • Figure 4: Evaluation of ICL Alignment using augmented prompts drawn from our internal RAG tool and the ABC AI Principles. Evaluation is conducted across bbq-lite and BLEU and ROUGE scores drawn from truthfulqa_mc2.