Table of Contents
Fetching ...

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework

Jiang Liu, Bolin Li, Haoyuan Li, Tianwei Lin, Wenqiao Zhang, Tao Zhong, Zhelun Yu, Jinghao Wei, Hao Cheng, Wanggui He, Fangxun Shu, Hao Jiang, Zheqi Lv, Juncheng Li, Siliang Tang, Yueting Zhuang

TL;DR

This work tackles on-device private-domain adaptation of efficient multimodal LLMs by proposing IDEALPrompt, a tuning-free, adaptive prompt optimization framework. IDEALPrompt operates in two stages—Reinforcement Warm-up Strategy to acquire general prompt optimization priors and Empirical Self-reflective Optimization to refine prompts using error analysis—without fine-tuning model parameters. The approach leverages a human-designed Strategy Pool and an RL-based search with memory to enable transfer across tasks and models, achieving strong performance on the Taobao-PDA private-domain benchmark while reducing adaptation costs. Empirical results show IDEALPrompt outperforms baselines, including some fine-tuning approaches, and provide insights into the effectiveness of the two-stage design and self-reflection for robust private-domain understanding.

Abstract

Efficient multimodal large language models (EMLLMs), in contrast to multimodal large language models (MLLMs), reduce model size and computational costs and are often deployed on resource-constrained devices. However, due to data privacy concerns, existing open-source EMLLMs rarely have access to private domain-specific data during the pre-training process, making them difficult to directly apply in device-specific domains, such as certain business scenarios. To address this weakness, this paper focuses on the efficient adaptation of EMLLMs to private domains, specifically in two areas: 1) how to reduce data requirements, and 2) how to avoid parameter fine-tuning. Specifically, we propose a tun\textbf{\underline{I}}ng-free, a\textbf{\underline{D}}aptiv\textbf{\underline{E}}, univers\textbf{\underline{AL}} \textbf{\underline{Prompt}} Optimization Framework, abbreviated as \textit{\textbf{\ourmethod{}}} which consists of two stages: 1) Predefined Prompt, based on the reinforcement searching strategy, generate a prompt optimization strategy tree to acquire optimization priors; 2) Prompt Reflection initializes the prompt based on optimization priors, followed by self-reflection to further search and refine the prompt. By doing so, \ourmethod{} elegantly generates the ``ideal prompts'' for processing private domain-specific data. Note that our method requires no parameter fine-tuning and only a small amount of data to quickly adapt to the data distribution of private data. Extensive experiments across multiple tasks demonstrate that our proposed \ourmethod{} significantly improves both efficiency and performance compared to baselines.

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework

TL;DR

This work tackles on-device private-domain adaptation of efficient multimodal LLMs by proposing IDEALPrompt, a tuning-free, adaptive prompt optimization framework. IDEALPrompt operates in two stages—Reinforcement Warm-up Strategy to acquire general prompt optimization priors and Empirical Self-reflective Optimization to refine prompts using error analysis—without fine-tuning model parameters. The approach leverages a human-designed Strategy Pool and an RL-based search with memory to enable transfer across tasks and models, achieving strong performance on the Taobao-PDA private-domain benchmark while reducing adaptation costs. Empirical results show IDEALPrompt outperforms baselines, including some fine-tuning approaches, and provide insights into the effectiveness of the two-stage design and self-reflection for robust private-domain understanding.

Abstract

Efficient multimodal large language models (EMLLMs), in contrast to multimodal large language models (MLLMs), reduce model size and computational costs and are often deployed on resource-constrained devices. However, due to data privacy concerns, existing open-source EMLLMs rarely have access to private domain-specific data during the pre-training process, making them difficult to directly apply in device-specific domains, such as certain business scenarios. To address this weakness, this paper focuses on the efficient adaptation of EMLLMs to private domains, specifically in two areas: 1) how to reduce data requirements, and 2) how to avoid parameter fine-tuning. Specifically, we propose a tun\textbf{\underline{I}}ng-free, a\textbf{\underline{D}}aptiv\textbf{\underline{E}}, univers\textbf{\underline{AL}} \textbf{\underline{Prompt}} Optimization Framework, abbreviated as \textit{\textbf{\ourmethod{}}} which consists of two stages: 1) Predefined Prompt, based on the reinforcement searching strategy, generate a prompt optimization strategy tree to acquire optimization priors; 2) Prompt Reflection initializes the prompt based on optimization priors, followed by self-reflection to further search and refine the prompt. By doing so, \ourmethod{} elegantly generates the ``ideal prompts'' for processing private domain-specific data. Note that our method requires no parameter fine-tuning and only a small amount of data to quickly adapt to the data distribution of private data. Extensive experiments across multiple tasks demonstrate that our proposed \ourmethod{} significantly improves both efficiency and performance compared to baselines.
Paper Structure (38 sections, 9 equations, 7 figures, 13 tables)

This paper contains 38 sections, 9 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: (a) shows the gap between public domain data and private domain data. (b) describes the simplified version of our IDEALPrompt. (c) illustrates that compared to baseline methods, our method achieves superior performance on Taobao-PDA benchmark.
  • Figure 2: The architecture of IDEALPrompt. It includes Strategy Pool, Reinforcement Warm-up Strategy and Empirical Self-reflective Optimization, avoiding parameter fine-tuning and requiring only a small amount of data for efficient adaptation on the device.
  • Figure 3: Data characteristics of Taobao-PDA
  • Figure 4: (a) Performance comparison between the single strategy and the exploration-exploitation strategy. (b) Performance and search steps comparison between the brute-force search and the exploration-exploitation strategy tree search. (c) and (d) Performance comparison among the absence of various components in Empirical Self-reflective Optimization.
  • Figure 5: A case of prompt optimization using the IDEALPrompt.
  • ...and 2 more figures