Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework
Jiang Liu, Bolin Li, Haoyuan Li, Tianwei Lin, Wenqiao Zhang, Tao Zhong, Zhelun Yu, Jinghao Wei, Hao Cheng, Wanggui He, Fangxun Shu, Hao Jiang, Zheqi Lv, Juncheng Li, Siliang Tang, Yueting Zhuang
TL;DR
This work tackles on-device private-domain adaptation of efficient multimodal LLMs by proposing IDEALPrompt, a tuning-free, adaptive prompt optimization framework. IDEALPrompt operates in two stages—Reinforcement Warm-up Strategy to acquire general prompt optimization priors and Empirical Self-reflective Optimization to refine prompts using error analysis—without fine-tuning model parameters. The approach leverages a human-designed Strategy Pool and an RL-based search with memory to enable transfer across tasks and models, achieving strong performance on the Taobao-PDA private-domain benchmark while reducing adaptation costs. Empirical results show IDEALPrompt outperforms baselines, including some fine-tuning approaches, and provide insights into the effectiveness of the two-stage design and self-reflection for robust private-domain understanding.
Abstract
Efficient multimodal large language models (EMLLMs), in contrast to multimodal large language models (MLLMs), reduce model size and computational costs and are often deployed on resource-constrained devices. However, due to data privacy concerns, existing open-source EMLLMs rarely have access to private domain-specific data during the pre-training process, making them difficult to directly apply in device-specific domains, such as certain business scenarios. To address this weakness, this paper focuses on the efficient adaptation of EMLLMs to private domains, specifically in two areas: 1) how to reduce data requirements, and 2) how to avoid parameter fine-tuning. Specifically, we propose a tun\textbf{\underline{I}}ng-free, a\textbf{\underline{D}}aptiv\textbf{\underline{E}}, univers\textbf{\underline{AL}} \textbf{\underline{Prompt}} Optimization Framework, abbreviated as \textit{\textbf{\ourmethod{}}} which consists of two stages: 1) Predefined Prompt, based on the reinforcement searching strategy, generate a prompt optimization strategy tree to acquire optimization priors; 2) Prompt Reflection initializes the prompt based on optimization priors, followed by self-reflection to further search and refine the prompt. By doing so, \ourmethod{} elegantly generates the ``ideal prompts'' for processing private domain-specific data. Note that our method requires no parameter fine-tuning and only a small amount of data to quickly adapt to the data distribution of private data. Extensive experiments across multiple tasks demonstrate that our proposed \ourmethod{} significantly improves both efficiency and performance compared to baselines.
