AnchorOPT: Towards Optimizing Dynamic Anchors for Adaptive Prompt Learning
Zheng Li, Yibing Song, Xin Zhang, Lei Luo, Xiang Li, Jian Yang
TL;DR
AnchorOPT tackles the rigidity of fixed anchors in CLIP prompt learning by introducing dynamic anchors $t_{anc}$ and a learnable position matrix $W$ to adapt prompts to task context. It uses a two-stage training: Stage I optimizes $t_{anc}$ via alignment with LL-generated descriptions $t_d$, and Stage II freezes anchors and jointly optimizes soft tokens and $W$ (including a deep variant that preserves anchors across layers). Experiments on 11 datasets show base-to-novel and cross-dataset generalization gains when integrating AnchorOPT with strong baselines, often exceeding methods that add extra modules or regularization. The approach is plug-and-play, demonstrating that simple, dynamic prompt structures can yield strong generalization for CLIP.
Abstract
Existing prompt learning methods, which are built upon CLIP models, leverage textual tokens as anchors to guide the learnable soft tokens. This guidance improves CLIP generalizations. However, these anchors-static in both value and position-lack cross-task and stage-adaptive flexibility. To address this limitation, we propose AnchorOPT, a dynamic anchor-based prompt learning framework. Specifically, AnchorOPT introduces dynamism in two key dimensions: (i) anchor values eschew handcrafted explicit textual tokens (e.g., "shape", "color"), instead learning dynamically from task-specific data; and (ii) the positional relationship between anchor and soft tokens is no longer fixed but adaptively optimized via a learnable position matrix conditioned on the training stage and task context. Training occurs in two stages: we first learn the anchor tokens, then freeze and transfer them to the second stage for optimization of soft tokens and the position matrix. Extensive experiments demonstrate that using only a simple learnable anchor and position matrix achieves performance comparable to or exceeding some methods incorporating additional learnable modules or regularization techniques. As a plug-and-play module, AnchorOPT integrates seamlessly into existing frameworks, yielding consistent performance gains across diverse datasets. Code is publicly available at https://github.com/zhengli97/ATPrompt.
