Table of Contents
Fetching ...

Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing

Xin Guo, Zhiheng Xi, Yiwen Ding, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR

The paper addresses a bottleneck in self-improvement for large vision-language models, identifying a Matthew effect where simple, head data dominate successful trajectories and complex tail data are underexplored. It proposes four strategies—threshold clipping, repeat-based padding, adaptive-weighted resampling, and guided resampling—combined into a two-pronged approach of distribution reshaping and trajectory resampling to rebalance data during exploration and learning. Empirical results across Qwen2-VL-7B-Instruct and InternVL2.5-4B show that head-tail re-balancing yields consistent gains in visual reasoning and mitigates the iterative performance bottlenecks, often achieving larger tail improvements with greater efficiency. The work demonstrates that reframing self-improvement as an efficient sampling problem and enriching tail trajectories can enhance robustness and scalability in LVLMs, with practical implications for deploying visual reasoning systems in more challenging domains.

Abstract

Self-improvement has emerged as a mainstream paradigm for advancing the reasoning capabilities of large vision-language models (LVLMs), where models explore and learn from successful trajectories iteratively. However, we identify a critical issue during this process: the model excels at generating high-quality trajectories for simple queries (i.e., head data) but struggles with more complex ones (i.e., tail data). This leads to an imbalanced optimization that drives the model to prioritize simple reasoning skills, while hindering its ability to tackle more complex reasoning tasks. Over iterations, this imbalance becomes increasingly pronounced--a dynamic we term the "Matthew effect"--which ultimately hinders further model improvement and leads to performance bottlenecks. To counteract this challenge, we introduce four efficient strategies from two perspectives: distribution-reshaping and trajectory-resampling, to achieve head-tail re-balancing during the exploration-and-learning self-improvement process. Extensive experiments on Qwen2-VL-7B-Instruct and InternVL2.5-4B models across visual reasoning tasks demonstrate that our methods consistently improve visual reasoning capabilities, outperforming vanilla self-improvement by 3.86 points on average.

Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing

TL;DR

The paper addresses a bottleneck in self-improvement for large vision-language models, identifying a Matthew effect where simple, head data dominate successful trajectories and complex tail data are underexplored. It proposes four strategies—threshold clipping, repeat-based padding, adaptive-weighted resampling, and guided resampling—combined into a two-pronged approach of distribution reshaping and trajectory resampling to rebalance data during exploration and learning. Empirical results across Qwen2-VL-7B-Instruct and InternVL2.5-4B show that head-tail re-balancing yields consistent gains in visual reasoning and mitigates the iterative performance bottlenecks, often achieving larger tail improvements with greater efficiency. The work demonstrates that reframing self-improvement as an efficient sampling problem and enriching tail trajectories can enhance robustness and scalability in LVLMs, with practical implications for deploying visual reasoning systems in more challenging domains.

Abstract

Self-improvement has emerged as a mainstream paradigm for advancing the reasoning capabilities of large vision-language models (LVLMs), where models explore and learn from successful trajectories iteratively. However, we identify a critical issue during this process: the model excels at generating high-quality trajectories for simple queries (i.e., head data) but struggles with more complex ones (i.e., tail data). This leads to an imbalanced optimization that drives the model to prioritize simple reasoning skills, while hindering its ability to tackle more complex reasoning tasks. Over iterations, this imbalance becomes increasingly pronounced--a dynamic we term the "Matthew effect"--which ultimately hinders further model improvement and leads to performance bottlenecks. To counteract this challenge, we introduce four efficient strategies from two perspectives: distribution-reshaping and trajectory-resampling, to achieve head-tail re-balancing during the exploration-and-learning self-improvement process. Extensive experiments on Qwen2-VL-7B-Instruct and InternVL2.5-4B models across visual reasoning tasks demonstrate that our methods consistently improve visual reasoning capabilities, outperforming vanilla self-improvement by 3.86 points on average.

Paper Structure

This paper contains 46 sections, 13 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Matthew effect in self-improvement of LVLMs over iterations and our re-balanced solution. Dark areas illustrate the imbalanced distribution in vanilla self-improvement, where dominant head and narrow tail become more severe over iterations. Light areas depict re-balanced self-improvement–our methods for counteracting Matthew effect by reducing the head and augmenting the tail.
  • Figure 2: Performance bottlenecks and distribution characteristics in self-improvement. (a) Phenomenon of performance bottlenecks under different sampling numbers $K$. (b) Comparison of difficulty level distributions between original and self-generated data, ranging from level 1 (easiest) to level 5 (most difficult). (c) Differences in response length distributions between original and self-generated data, with dashed lines indicating mean values.
  • Figure 3: Matthew effect over iterations. (a) Matthew effect in the distribution of data in $\mathcal{D}_{\textnormal{filter}}$ with different accuracy under the setting of $K=4$. (b) Trends in average response length (i.e., number of tokens) across iterations under different sampling numbers $K$. (c) Matthew effect in average response length for data of different difficulty levels across iterations under the setting of $K=8$, where difficulty increases progressively from level 1 to level 5.
  • Figure 4: Data distribution of difficulty levels (1=easiest, 5=most difficult) in successful trajectories under different strategies with Qwen2-VL-7B-Instruct at $K=16$.
  • Figure 5: Average performance comparison between vanilla and self-correction in visual self-improvement.
  • ...and 12 more figures