CPT: Consistent Proxy Tuning for Black-box Optimization
Yuanyang He, Zitong Huang, Xinxing Xu, Rick Siow Mong Goh, Salman Khan, Wangmeng Zuo, Yong Liu, Chun-Mei Feng
TL;DR
CPT addresses the mismatch between training and test-time in Proxy-tuning for black-box models by jointly leveraging a frozen large black-box model and a frozen white-box proxy during the training of a tunable white-box proxy. The method introduces a logit-offset scheme with a train-time factor $\alpha_{train}$ and a test-time factor $\alpha_{test}$, unifying the training objective and inference form when $\alpha_{train}=\alpha_{test}$. Empirically, CPT yields meaningful gains over Proxy-tuning on both LLM and VLM tasks, with mean accuracy improvements across multiple datasets and model scales, and remains robust under ablations such as different white-box tuning strategies and model sizes. The approach is model-agnostic and plug-and-play for logit-based tuning, offering a practical pathway to improve black-box models while enabling broader access to their capabilities without internal parameter access.
Abstract
Black-box tuning has attracted recent attention due to that the structure or inner parameters of advanced proprietary models are not accessible. Proxy-tuning provides a test-time output adjustment for tuning black-box language models. It applies the difference of the output logits before and after tuning a smaller white-box "proxy" model to improve the black-box model. However, this technique serves only as a decoding-time algorithm, leading to an inconsistency between training and testing which potentially limits overall performance. To address this problem, we introduce Consistent Proxy Tuning (CPT), a simple yet effective black-box tuning method. Different from Proxy-tuning, CPT additionally exploits the frozen large black-box model and another frozen small white-box model, ensuring consistency between training-stage optimization objective and test-time proxies. This consistency benefits Proxy-tuning and enhances model performance. Note that our method focuses solely on logit-level computation, which makes it model-agnostic and applicable to any task involving logit classification. Extensive experimental results demonstrate the superiority of our CPT in both black-box tuning of Large Language Models (LLMs) and Vision-Language Models (VLMs) across various datasets. The code is available at https://github.com/chunmeifeng/CPT.
