Empowering Small VLMs to Think with Dynamic Memorization and Exploration

Jiazhen Liu; Yuchuan Deng; Long Chen

Empowering Small VLMs to Think with Dynamic Memorization and Exploration

Jiazhen Liu, Yuchuan Deng, Long Chen

TL;DR

DyME is proposed, a novel training paradigm that Dynamically selects between Memorization and Exploration at each optimization step that serves as a robust, standalone strategy that stabilizes SVLM learning.

Abstract

Small-scale Vision-Language Models (SVLMs) are exceptionally well-suited for proprietary tasks. Equipping them with thinking capabilities is a critical step to enhance their performance and reliability in these specific domains. However, existing training paradigms, including Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Reward (RLVR), impose substantial demands on the base VLM, exceeding the capacity of SVLMs. Consequently, directly applying these paradigms to SVLMs fails to instill the desired thinking abilities. A natural solution is to combine SFT and RLVR, leveraging their complementarity to reduce the dependence on model capacity. Yet the core challenge lies in managing the inherent trade-off: excessive reliance on SFT can force the model to memorize pseudo thinking traces, while over-emphasizing RLVR can lead to unstable exploration (i.e., advantage collapse). To address this, we propose DyME, a novel training paradigm that Dynamically selects between Memorization (via SFT) and Exploration (via RLVR) at each optimization step. By ensuring that every update contributes to the trade-off, DyME serves as a robust, standalone strategy that stabilizes SVLM learning. Complementing this paradigm, we further introduce a synergistic Visual Supervision mechanism (comprising a visual checker and refiner) designed to inject dynamically enhanced, image-grounded guidance during optimization. Extensive experiments across diverse domains demonstrate that DyME consistently achieves this balance, and thus delivers substantial performance improvements on specialized tasks. These results establish DyME as a practical and effective solution for empowering SVLMs with reliable thinking capabilities. GitHub: https://github.com/HKUST-LongGroup/DyME

Empowering Small VLMs to Think with Dynamic Memorization and Exploration

TL;DR

Abstract

Empowering Small VLMs to Think with Dynamic Memorization and Exploration

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)