Table of Contents
Fetching ...

ALMA: Alignment with Minimal Annotation

Michihiro Yasunaga, Leonid Shamis, Chunting Zhou, Andrew Cohen, Jason Weston, Luke Zettlemoyer, Marjan Ghazvininejad

TL;DR

ALMA demonstrates that high-quality alignment of a base LLM can be achieved with dramatically less human annotation by leveraging self-bootstrapped data-generation techniques. By diversifying prompts, expanding response sampling across multiple model checkpoints, and enhancing the judge with real-valued scoring and self-distillation, ALMA sustains improvement over 10 rounds starting from only 9k labeled examples. The approach yields performance close to Llama3 Instruct on standard alignment benchmarks while using under 1% of the traditional annotation budget, suggesting that base models encode substantial latent alignment knowledge and can be exposed through synthetic data methods. This work advocates a data-synthesis-centric path toward scalable, cost-effective LLM alignment and motivates future exploration of seed-data typologies and safety considerations.

Abstract

Recent approaches to large language model (LLM) alignment typically require millions of human annotations or rely on external aligned models for synthetic data generation. This paper introduces ALMA: Alignment with Minimal Annotation, demonstrating that effective alignment can be achieved using only 9,000 labeled examples -- less than 1% of conventional approaches. ALMA generates large amounts of high-quality synthetic alignment data through new techniques: diverse prompt synthesis via few-shot learning, diverse response generation with multiple model checkpoints, and judge (reward model) enhancement through score aggregation and self-distillation. Using only a pretrained Llama3 base model, 5,000 SFT examples, and 4,000 judge annotations, ALMA achieves performance close to Llama3-Instruct across diverse alignment benchmarks (e.g., 0.1% difference on AlpacaEval 2.0 score). These results are achieved with a multi-round, self-bootstrapped data synthesis and training recipe that continues to improve for 10 rounds, surpassing the typical 3-round ceiling of previous methods. These results suggest that base models already possess sufficient knowledge for effective alignment, and that synthetic data generation methods can expose it.

ALMA: Alignment with Minimal Annotation

TL;DR

ALMA demonstrates that high-quality alignment of a base LLM can be achieved with dramatically less human annotation by leveraging self-bootstrapped data-generation techniques. By diversifying prompts, expanding response sampling across multiple model checkpoints, and enhancing the judge with real-valued scoring and self-distillation, ALMA sustains improvement over 10 rounds starting from only 9k labeled examples. The approach yields performance close to Llama3 Instruct on standard alignment benchmarks while using under 1% of the traditional annotation budget, suggesting that base models encode substantial latent alignment knowledge and can be exposed through synthetic data methods. This work advocates a data-synthesis-centric path toward scalable, cost-effective LLM alignment and motivates future exploration of seed-data typologies and safety considerations.

Abstract

Recent approaches to large language model (LLM) alignment typically require millions of human annotations or rely on external aligned models for synthetic data generation. This paper introduces ALMA: Alignment with Minimal Annotation, demonstrating that effective alignment can be achieved using only 9,000 labeled examples -- less than 1% of conventional approaches. ALMA generates large amounts of high-quality synthetic alignment data through new techniques: diverse prompt synthesis via few-shot learning, diverse response generation with multiple model checkpoints, and judge (reward model) enhancement through score aggregation and self-distillation. Using only a pretrained Llama3 base model, 5,000 SFT examples, and 4,000 judge annotations, ALMA achieves performance close to Llama3-Instruct across diverse alignment benchmarks (e.g., 0.1% difference on AlpacaEval 2.0 score). These results are achieved with a multi-round, self-bootstrapped data synthesis and training recipe that continues to improve for 10 rounds, surpassing the typical 3-round ceiling of previous methods. These results suggest that base models already possess sufficient knowledge for effective alignment, and that synthetic data generation methods can expose it.

Paper Structure

This paper contains 33 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Overview of our approach ALMA: Alignment with Minimal Annotation. Starting with only a pretrained base LLM (Llama3 Base) and minimal seed data (9k samples---less than 1% of conventional approaches), we align the model to achieve performance close to Llama3 Instruct (left panel). This is achieved through our new alignment techniques (right panel) that enhance each of the four key components in alignment: prompt synthesis (§ \ref{['sec:method-prompt']}), response synthesis (§ \ref{['sec:method-response']}), judge (§ \ref{['sec:method-judge']}), and model training (§ \ref{['sec:method-train']}).
  • Figure 2: Performance progression of our alignment training, from 0th round (base model) to 1st round (initial SFT) to 11th round (our final model). The evaluation metric is the Armo score across alignment benchmarks: LIMA, MT-Bench, SR, Arena Hard, and Alpaca. We can see that starting from Llama3-Base (green line) and only using less than 10k labeled seed data, our model (blue line) steadily approaches the performance of Llama3-Instruct (red line).
  • Figure 3: Effect of response sampling size N on the quality of best-of-n response. We evaluate the extrinsic reward score (Armo score) of the best response selected by our judge from N response samples, and analyze how performance scales with N. While conventional alignment methods typically sample 10--20 responses, we find that the quality of the best-of-n response continues to improve beyond N=10 and even at N=200. Based on this result, we set N=200 for response synthesis in our alignment recipe (§ \ref{['sec:method-response']}).