Table of Contents
Fetching ...

Iterative Graph Alignment

Fangyuan Yu, Hardeep Singh Arora, Matt Johnson

TL;DR

This work tackles representation gaps that impede reliable rule-based alignment in large language models. It introduces Iterative Graph Alignment (IGA), a annotation-free framework that fuses Iterative Graph Prompting (IGP) for graph-based reasoning with Self-Aligned Incremental Learning (SAIL) for adaptive, diverse data augmentation and iterative fine-tuning. Through RuleAlign, a 1.5K-query dataset across five rule-based tasks, the authors demonstrate substantial gains: IGP yields up to 73.12% relative improvement in rule-based alignment on Claude Sonnet 3.5, and IGA finetuning with Llama3-8B-Instruct achieves up to 86.20% improvement, matching or exceeding proprietary baselines. The approach reduces reliance on human annotation, patches representation gaps, and offers a scalable path toward more robust, rule-consistent LLMs through a multi-agent curriculum learning paradigm.

Abstract

By compressing diverse narratives, LLMs go beyond memorization, achieving intelligence by capturing generalizable causal relationships. However, they suffer from local 'representation gaps' due to insufficient training data diversity, limiting their real-world utility, especially in tasks requiring strict alignment to rules. Traditional alignment methods relying on heavy human annotations are inefficient and unscalable. Recent self-alignment techniques also fall short, as they often depend on self-selection based prompting and memorization-based learning. To address these issues, we introduce Iterative Graph Alignment (IGA), an annotation-free rule-based alignment algorithm. A teacher model (VLM) employs Iterative Graph Prompting (IGP) to create logical graphs and reference answers. The student model (LLM) identifies local knowledge gaps by attempting to align its responses with these references, collaborating with helper models to generate diverse answers. These aligned responses are then used for iterative supervised fine-tuning (SFT). Our evaluations across five rule-based scenarios demonstrate IGP's effectiveness, with a 73.12\% alignment improvement in Claude Sonnet 3.5, and Llama3-8B-Instruct achieving an 86.20\% improvement, outperforming Claude Sonnet 3.5 in rule-based alignment.

Iterative Graph Alignment

TL;DR

This work tackles representation gaps that impede reliable rule-based alignment in large language models. It introduces Iterative Graph Alignment (IGA), a annotation-free framework that fuses Iterative Graph Prompting (IGP) for graph-based reasoning with Self-Aligned Incremental Learning (SAIL) for adaptive, diverse data augmentation and iterative fine-tuning. Through RuleAlign, a 1.5K-query dataset across five rule-based tasks, the authors demonstrate substantial gains: IGP yields up to 73.12% relative improvement in rule-based alignment on Claude Sonnet 3.5, and IGA finetuning with Llama3-8B-Instruct achieves up to 86.20% improvement, matching or exceeding proprietary baselines. The approach reduces reliance on human annotation, patches representation gaps, and offers a scalable path toward more robust, rule-consistent LLMs through a multi-agent curriculum learning paradigm.

Abstract

By compressing diverse narratives, LLMs go beyond memorization, achieving intelligence by capturing generalizable causal relationships. However, they suffer from local 'representation gaps' due to insufficient training data diversity, limiting their real-world utility, especially in tasks requiring strict alignment to rules. Traditional alignment methods relying on heavy human annotations are inefficient and unscalable. Recent self-alignment techniques also fall short, as they often depend on self-selection based prompting and memorization-based learning. To address these issues, we introduce Iterative Graph Alignment (IGA), an annotation-free rule-based alignment algorithm. A teacher model (VLM) employs Iterative Graph Prompting (IGP) to create logical graphs and reference answers. The student model (LLM) identifies local knowledge gaps by attempting to align its responses with these references, collaborating with helper models to generate diverse answers. These aligned responses are then used for iterative supervised fine-tuning (SFT). Our evaluations across five rule-based scenarios demonstrate IGP's effectiveness, with a 73.12\% alignment improvement in Claude Sonnet 3.5, and Llama3-8B-Instruct achieving an 86.20\% improvement, outperforming Claude Sonnet 3.5 in rule-based alignment.
Paper Structure (20 sections, 3 equations, 4 figures, 2 tables, 4 algorithms)

This paper contains 20 sections, 3 equations, 4 figures, 2 tables, 4 algorithms.

Figures (4)

  • Figure 1: Customer Roleplay Issue
  • Figure 2: Iterative Graph Alignment (IGA) . A teacher model (VLM) iteratively generates logical graphs and reference answers using Iterative Graph Prompting (IGP). A student model (LLM) reviews its responses against these reference answers to identify hard cases where representation gaps exist. The student then collaborates with helper models to explore diverse ways to respond to these challenging queries by taking hints from the logical graphs and reference answers, before fine-tuning on the collected insights and proceed to the next iteration.
  • Figure 3: Iterative Graph Prompting (IGP)
  • Figure 4: Self-Aligned Incremental Learning (SAIL)