Table of Contents
Fetching ...

A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

Xin Gao, Qizhi Pei, Zinan Tang, Yu Li, Honglin Lin, Jiang Wu, Lijun Wu, Conghui He

TL;DR

This paper introduces GRA, a peer-review-inspired framework that coordinates multiple small LLMs to perform data synthesis via three specialized roles: Generator, Reviewer, and Adjudicator. By decomposing tasks and enforcing iterative quality control with post-processing, GRA achieves data quality comparable to or surpassing single large LLM distillation while significantly reducing computational costs. Experiments across diverse domains demonstrate improved data diversity and difficulty, and ablations confirm the necessity of each component, particularly the adjudicator and multi-reviewer setup. The approach offers a scalable, sustainable alternative for high-quality data synthesis in instruction-tuning, with potential for broader domain applicability and multimodal extensions.

Abstract

While data synthesis and distillation are promising strategies to enhance small language models, current approaches heavily rely on Large Language Models (LLMs), which suffer from high computational costs, environmental inefficiency, and potential biases inherited from monolithic architectures. In contrast, smaller LLMs are more accessible and sustainable, but their individual capabilities often fall short in generating high-quality, diverse, and reliable data. Inspired by collaborative human processes (e.g., peer review), we propose a multiple small LLMs involved framework, GRA, that aggregates specialized roles across small LLMs to iterative refinement and quality control typically achieved by a single large LLM. In this collaborative framework, multiple small LLMs assume distinct roles-Generator, Reviewer, and Adjudicator-to simulate a peer-review-inspired data synthesis pipeline. The Generator proposes initial data samples, the Reviewer critiques their quality and diversity, and the Adjudicator resolves conflicts to finalize the output. By decomposing the synthesis process into specialized sub-tasks, collaborative small LLMs can achieve data-level parity with large LLM-based distillation. Through experiments across multiple benchmarks, we demonstrate that GRA-produced data matches or exceeds the quality of single large LLM outputs, e.g., Qwen-2.5-72B-Instruct. Our results challenge the necessity of monolithic large models for high-quality data synthesis, advocating instead for strategic coordination of smaller agents. Our datasets, models, and code are publicly available at https://github.com/GX-XinGao/GRA.

A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

TL;DR

This paper introduces GRA, a peer-review-inspired framework that coordinates multiple small LLMs to perform data synthesis via three specialized roles: Generator, Reviewer, and Adjudicator. By decomposing tasks and enforcing iterative quality control with post-processing, GRA achieves data quality comparable to or surpassing single large LLM distillation while significantly reducing computational costs. Experiments across diverse domains demonstrate improved data diversity and difficulty, and ablations confirm the necessity of each component, particularly the adjudicator and multi-reviewer setup. The approach offers a scalable, sustainable alternative for high-quality data synthesis in instruction-tuning, with potential for broader domain applicability and multimodal extensions.

Abstract

While data synthesis and distillation are promising strategies to enhance small language models, current approaches heavily rely on Large Language Models (LLMs), which suffer from high computational costs, environmental inefficiency, and potential biases inherited from monolithic architectures. In contrast, smaller LLMs are more accessible and sustainable, but their individual capabilities often fall short in generating high-quality, diverse, and reliable data. Inspired by collaborative human processes (e.g., peer review), we propose a multiple small LLMs involved framework, GRA, that aggregates specialized roles across small LLMs to iterative refinement and quality control typically achieved by a single large LLM. In this collaborative framework, multiple small LLMs assume distinct roles-Generator, Reviewer, and Adjudicator-to simulate a peer-review-inspired data synthesis pipeline. The Generator proposes initial data samples, the Reviewer critiques their quality and diversity, and the Adjudicator resolves conflicts to finalize the output. By decomposing the synthesis process into specialized sub-tasks, collaborative small LLMs can achieve data-level parity with large LLM-based distillation. Through experiments across multiple benchmarks, we demonstrate that GRA-produced data matches or exceeds the quality of single large LLM outputs, e.g., Qwen-2.5-72B-Instruct. Our results challenge the necessity of monolithic large models for high-quality data synthesis, advocating instead for strategic coordination of smaller agents. Our datasets, models, and code are publicly available at https://github.com/GX-XinGao/GRA.

Paper Structure

This paper contains 39 sections, 8 figures, 14 tables.

Figures (8)

  • Figure 1: Average performance across GRA, vanilla seed dataset and lagrge LLMs distilled data with Qwen-2.5-7B base model.
  • Figure 2: Overview of GRA's architecture, highlighting its four key modules: (a) The Generator creates domain-specific samples, (b) followed by collaborative evaluation by Reviewers, (c) The Adjudicator resolves conflicts, and (d) Post-Processing refines the results by removing redundancies.
  • Figure 3: Performance along data iterations with Qwen-2.5-7B-Base model.
  • Figure 4: comparison across different setting of reviewer and adjudicator, with alpaca as seed dataset and Llama-3.1-8B as base model.
  • Figure 5: Data coverage comparison between vanilla seed dataset and GRA synthetic data.
  • ...and 3 more figures