Table of Contents
Fetching ...

DynaPrompt: Dynamic Test-Time Prompt Tuning

Zehao Xiao, Shilin Yan, Jack Hong, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiayi Shen, Qi Wang, Cees G. M. Snoek

TL;DR

DynaPrompt tackles distribution shifts in vision-language models by introducing a dynamic test-time prompt-tuning framework. It maintains an online prompt buffer, selectively updating and appending prompts per test sample using entropy and probability-difference metrics to avoid collapse and exploit relevant information. By combining dynamic selection with a controlled buffer size and a prompt-append/delete policy, it achieves robust improvements across domain generalization and cross-dataset benchmarks, and can augment existing prompt-tuning methods. The approach offers practical gains for real-time adaptation with manageable computation, enabling stronger, data-aware test-time performance.

Abstract

Test-time prompt tuning enhances zero-shot generalization of vision-language models but tends to ignore the relatedness among test samples during inference. Online test-time prompt tuning provides a simple way to leverage the information in previous test samples, albeit with the risk of prompt collapse due to error accumulation. To enhance test-time prompt tuning, we propose DynaPrompt, short for dynamic test-time prompt tuning, exploiting relevant data distribution information while reducing error accumulation. Built on an online prompt buffer, DynaPrompt adaptively selects and optimizes the relevant prompts for each test sample during tuning. Specifically, we introduce a dynamic prompt selection strategy based on two metrics: prediction entropy and probability difference. For unseen test data information, we develop dynamic prompt appending, which allows the buffer to append new prompts and delete the inactive ones. By doing so, the prompts are optimized to exploit beneficial information on specific test data, while alleviating error accumulation. Experiments on fourteen datasets demonstrate the effectiveness of dynamic test-time prompt tuning.

DynaPrompt: Dynamic Test-Time Prompt Tuning

TL;DR

DynaPrompt tackles distribution shifts in vision-language models by introducing a dynamic test-time prompt-tuning framework. It maintains an online prompt buffer, selectively updating and appending prompts per test sample using entropy and probability-difference metrics to avoid collapse and exploit relevant information. By combining dynamic selection with a controlled buffer size and a prompt-append/delete policy, it achieves robust improvements across domain generalization and cross-dataset benchmarks, and can augment existing prompt-tuning methods. The approach offers practical gains for real-time adaptation with manageable computation, enabling stronger, data-aware test-time performance.

Abstract

Test-time prompt tuning enhances zero-shot generalization of vision-language models but tends to ignore the relatedness among test samples during inference. Online test-time prompt tuning provides a simple way to leverage the information in previous test samples, albeit with the risk of prompt collapse due to error accumulation. To enhance test-time prompt tuning, we propose DynaPrompt, short for dynamic test-time prompt tuning, exploiting relevant data distribution information while reducing error accumulation. Built on an online prompt buffer, DynaPrompt adaptively selects and optimizes the relevant prompts for each test sample during tuning. Specifically, we introduce a dynamic prompt selection strategy based on two metrics: prediction entropy and probability difference. For unseen test data information, we develop dynamic prompt appending, which allows the buffer to append new prompts and delete the inactive ones. By doing so, the prompts are optimized to exploit beneficial information on specific test data, while alleviating error accumulation. Experiments on fourteen datasets demonstrate the effectiveness of dynamic test-time prompt tuning.

Paper Structure

This paper contains 14 sections, 9 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustrations of different test-time prompt tuning methods. Circles with specific colors denote learned prompts for individual test samples. For a single test sample, (a) test-time prompt tuning learns prompts from a shared initialization $\mathbf{v}_0$, ignoring the relatedness among test samples. (b) online test-time prompt tuning incorporates previous test sample information by using the previous-sample optimized prompt as the starting point, which leads to error accumulation. (c) we propose dynamic test-time prompt tuning (top) to adaptively exploit relevant information from previous test samples and alleviate error accumulation, which is achieved by autonomously selecting, updating, appending, and deleting online prompts in a prompt buffer $\mathcal{V}$ (bottom).
  • Figure 2: Prompt collapse in online test-time prompt tuning. We measure online test-time accuracy for different methods on ImageNet-A, where the accuracy for each block is calculated on 200 samples. Test-time Prompt Tuning (TPT) achieves stable accuracy by independently tuning the prompts for each test sample. Online TPT has severe error accumulation problems with competitive performance at the beginning while dropping significantly during online learning. Oracle is more stable by online tuning prompts only for correct predictions and achieves much better performance by incorporating the relevant information. Our method aims to exploit relevant information from previous online samples automatically while reducing error accumulation.
  • Figure 3: The process of our dynamic test-time prompt tuning. (Left) In dynamic prompt selection, we select the relevant online prompts from the prompt buffer $\mathcal{V}$ for each test sample using the intersection of the prompt subsets obtained by entropy and probability difference metrics. The selected prompts are optimized by entropy minimization before making predictions. (Right) If no prompt is selected, our dynamic prompt appending strategy assigns a new prompt initialized by $\mathbf{v}_0$ for the test sample and appends it to the prompt buffer. We always append the new optimized prompt on top of the buffer, moving the inactive ones to the bottom, which we can remove directly when appending new prompts to the full buffer.
  • Figure 4: Top-1 Accuracy of different prompt buffer sizes.
  • Figure 5: Time costs with different prompt buffer sizes.
  • ...and 2 more figures