DIVE: Diversified Iterative Self-Improvement

Yiwei Qin; Yixiu Liu; Pengfei Liu

DIVE: Diversified Iterative Self-Improvement

Yiwei Qin, Yixiu Liu, Pengfei Liu

TL;DR

DIVE (Diversified Iterative Self-Improvement), a novel framework that addresses this challenge through two key components: Sample Pool Expansion for broader solution exploration, and Data Selection for balancing diversity and quality in preference pairs.

Abstract

Recent advances in large language models (LLMs) have demonstrated the effectiveness of Iterative Self-Improvement (ISI) techniques. However, continuous training on self-generated data leads to reduced output diversity, a limitation particularly critical in reasoning tasks where diverse solution paths are essential. We present DIVE (Diversified Iterative Self-Improvement), a novel framework that addresses this challenge through two key components: Sample Pool Expansion for broader solution exploration, and Data Selection for balancing diversity and quality in preference pairs. Experiments on MATH and GSM8k datasets show that DIVE achieves a 10% to 45% relative increase in output diversity metrics while maintaining performance quality compared to vanilla ISI. Our ablation studies confirm both components' significance in achieving these improvements. Code is available at https://github.com/qinyiwei/DIVE.

DIVE: Diversified Iterative Self-Improvement

TL;DR

Abstract

Paper Structure (38 sections, 3 equations, 6 figures, 2 tables)

This paper contains 38 sections, 3 equations, 6 figures, 2 tables.

Introduction
Methodology
Iterative Self Improvement
Direct Preference Optimization (DPO) rafailov2024direct
Iterative Training
Diversified Iterative Self-Improvement
Sample Pool Expansion
Increased Sampling per Question
Global Data Usage
Data Selection
Greedy Selection Method
Balancing Quality and Diversity
Experiment
Experimental Settings
datasets
...and 23 more sections

Figures (6)

Figure 1: Overview of the Diversified Iterative Self-Improvement (DIVE) framework. At each iteration $t$, the process includes response generation, pool expansion through correct and incorrect response collection, data selection for balancing quality and diversity, and model refinement through preference learning, producing an improved model $M^{t+1}$ for the next iteration.
Figure 2: Evolution of diversity metrics and model performance across iterations (M0-M6) for both GSM8k and MATH datasets. Each subplot shows different evaluation metrics: Distinct-N for positive and negative examples, SentBERT embeddings similarity, and accuracy measures. Solid and dashed lines with different colors represent different sampling settings and methods.
Figure 3: Comparison of different sampling strategies for GSM8k dataset.
Figure 4: Diversity trends across different difficulty levels (Level 1-5) for positive and negative examples. The plots demonstrate how question difficulty influences output diversity during the ISI process.
Figure 5: Results of different diversity metrics for both the GSM8k and MATH datasets. Only the results from the iteration with the highest accuracy are shown, while the results for all iterations are provided in Appendix \ref{['app: Iter_div']}.
...and 1 more figures

DIVE: Diversified Iterative Self-Improvement

TL;DR

Abstract

DIVE: Diversified Iterative Self-Improvement

Authors

TL;DR

Abstract

Table of Contents

Figures (6)