Table of Contents
Fetching ...

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

Yiming Liang, Ge Zhang, Xingwei Qu, Tianyu Zheng, Jiawei Guo, Xinrun Du, Zhenzhu Yang, Jiaheng Liu, Chenghua Lin, Lei Ma, Wenhao Huang, Jiajun Zhang

TL;DR

I-SHEEP tackles continuous self-alignment of LLMs from scratch by combining self-driven data synthesis with metacognitive self-assessment, filtering, and supervised fine-tuning to yield iterative improvements without external data or tools. The framework demonstrates significant gains across multiple model families and benchmarks, including up to 78.2% relative improvement on Alpaca Eval and notable gains in IFEval, code generation, and SQuAD, while highlighting the importance of metacognitive prompts and data filtering. Key contributions include the introduction of an explicit self-assessment mechanism, a detailed ablation study on data size, thresholds, and prompts, and evidence of generalization to other models like Llama-3, suggesting strong potential for resource-efficient, continuous self-improvement. Limitations include reliance on RLHF to realize final gains, potential synthetic-data biases, and the need for further work to fully mitigate incorrect cognitions and safety concerns.

Abstract

Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment. Some approaches train LLMs using their own generated synthetic data, exploring the possibility of active alignment. However, there is still a huge gap between these one-time alignment methods and the continuous automatic alignment of humans. In this paper, we introduce \textbf{I-SHEEP}, an \textbf{I}terative \textbf{S}elf-En\textbf{H}anc\textbf{E}m\textbf{E}nt \textbf{P}aradigm.This human-like paradigm enables LLMs to \textbf{continuously self-align from scratch with nothing}. Compared to the one-time alignment method Dromedary \cite{sun2023principledriven}, which refers to the first iteration in this paper, I-SHEEP can significantly enhance capacities on both Qwen and Llama models. I-SHEEP achieves a maximum relative improvement of 78.2\% in the Alpaca Eval, 24.0\% in the MT Bench, and an absolute increase of 8.88\% in the IFEval accuracy over subsequent iterations in Qwen-1.5 72B model. Additionally, I-SHEEP surpasses the base model in various standard benchmark generation tasks, achieving an average improvement of 24.77\% in code generation tasks, 12.04\% in TrivialQA, and 20.29\% in SQuAD. We also provide new insights based on the experiment results. Our codes, datasets, and models are available at \textbf{https://anonymous.4open.science/r/I-SHEEP}.

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

TL;DR

I-SHEEP tackles continuous self-alignment of LLMs from scratch by combining self-driven data synthesis with metacognitive self-assessment, filtering, and supervised fine-tuning to yield iterative improvements without external data or tools. The framework demonstrates significant gains across multiple model families and benchmarks, including up to 78.2% relative improvement on Alpaca Eval and notable gains in IFEval, code generation, and SQuAD, while highlighting the importance of metacognitive prompts and data filtering. Key contributions include the introduction of an explicit self-assessment mechanism, a detailed ablation study on data size, thresholds, and prompts, and evidence of generalization to other models like Llama-3, suggesting strong potential for resource-efficient, continuous self-improvement. Limitations include reliance on RLHF to realize final gains, potential synthetic-data biases, and the need for further work to fully mitigate incorrect cognitions and safety concerns.

Abstract

Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment. Some approaches train LLMs using their own generated synthetic data, exploring the possibility of active alignment. However, there is still a huge gap between these one-time alignment methods and the continuous automatic alignment of humans. In this paper, we introduce \textbf{I-SHEEP}, an \textbf{I}terative \textbf{S}elf-En\textbf{H}anc\textbf{E}m\textbf{E}nt \textbf{P}aradigm.This human-like paradigm enables LLMs to \textbf{continuously self-align from scratch with nothing}. Compared to the one-time alignment method Dromedary \cite{sun2023principledriven}, which refers to the first iteration in this paper, I-SHEEP can significantly enhance capacities on both Qwen and Llama models. I-SHEEP achieves a maximum relative improvement of 78.2\% in the Alpaca Eval, 24.0\% in the MT Bench, and an absolute increase of 8.88\% in the IFEval accuracy over subsequent iterations in Qwen-1.5 72B model. Additionally, I-SHEEP surpasses the base model in various standard benchmark generation tasks, achieving an average improvement of 24.77\% in code generation tasks, 12.04\% in TrivialQA, and 20.29\% in SQuAD. We also provide new insights based on the experiment results. Our codes, datasets, and models are available at \textbf{https://anonymous.4open.science/r/I-SHEEP}.
Paper Structure (34 sections, 1 equation, 4 figures, 11 tables, 1 algorithm)

This paper contains 34 sections, 1 equation, 4 figures, 11 tables, 1 algorithm.

Figures (4)

  • Figure 1: Pipeline of I-SHEEP. The I-SHEEP framework takes the base model and small seed dataset as input, aligns the base model iteratively from scratch independently, and finally obtains the self-enhanced models and high-quality synthetic datasets. The I-SHEEP framework consists of four main components: the self-synthesize process generates instruction-pair data, the self-assessment assesses the quality of the resulting data, the filtering component filters out low-quality data based on self-assessment, and the training component integrates the high-quality data into the base model.
  • Figure 2: Ablation performance for the first three iterations across different thresholds and data sizes. In subfigure \ref{['fig:threshold_total']}, the threshold -1 means that the generated data is not filtered by heuristic rules. The threshold 0 represents that the I-SHEEP process does not use the self-assessment phase. Other thresholds represent filtering low-quality data using the threshold, which refers to the score from the self-assessment phase. In subfigure \ref{['fig:data_size_total']}, the values on the horizontal axis represent the amount of data generated (in thousands).
  • Figure 4: The proportion of high-quality data to the total generated data across different iterations. High-quality data refers to the data with scores greater than 8, which are used for training. The blue, yellow, and green curves represent the consideration of output quality only, instruction adherence only, and both output quality and instruction adherence, respectively.
  • Figure 5: The generated data projects onto the first two dimensions of the OpenHermes-2.5 using principal component analysis (PCA). Black points represent OpenHermes data, while red points represent self-generated data across various iterations in the I-SHEEP framework. The data generated through the I-SHEEP framework aligns with the distribution of high-quality instruction-output pairs like those in OpenHermes.