Table of Contents
Fetching ...

Prime Once, then Reprogram Locally: An Efficient Alternative to Black-Box Service Model Adaptation

Yunbei Zhang, Chengyi Cai, Feng Liu, Jihun Hamm

Abstract

Adapting closed-box service models (i.e., APIs) for target tasks typically relies on reprogramming via Zeroth-Order Optimization (ZOO). However, this standard strategy is known for extensive, costly API calls and often suffers from slow, unstable optimization. Furthermore, we observe that this paradigm faces new challenges with modern APIs (e.g., GPT-4o). These models can be less sensitive to the input perturbations ZOO relies on, thereby hindering performance gains. To address these limitations, we propose an Alternative efficient Reprogramming approach for Service models (AReS). Instead of direct, continuous closed-box optimization, AReS initiates a single-pass interaction with the service API to prime an amenable local pre-trained encoder. This priming stage trains only a lightweight layer on top of the local encoder, making it highly receptive to the subsequent glass-box (white-box) reprogramming stage performed directly on the local model. Consequently, all subsequent adaptation and inference rely solely on this local proxy, eliminating all further API costs. Experiments demonstrate AReS's effectiveness where prior ZOO-based methods struggle: on GPT-4o, AReS achieves a +27.8% gain over the zero-shot baseline, a task where ZOO-based methods provide little to no improvement. Broadly, across ten diverse datasets, AReS outperforms state-of-the-art methods (+2.5% for VLMs, +15.6% for standard VMs) while reducing API calls by over 99.99%. AReS thus provides a robust and practical solution for adapting modern closed-box models.

Prime Once, then Reprogram Locally: An Efficient Alternative to Black-Box Service Model Adaptation

Abstract

Adapting closed-box service models (i.e., APIs) for target tasks typically relies on reprogramming via Zeroth-Order Optimization (ZOO). However, this standard strategy is known for extensive, costly API calls and often suffers from slow, unstable optimization. Furthermore, we observe that this paradigm faces new challenges with modern APIs (e.g., GPT-4o). These models can be less sensitive to the input perturbations ZOO relies on, thereby hindering performance gains. To address these limitations, we propose an Alternative efficient Reprogramming approach for Service models (AReS). Instead of direct, continuous closed-box optimization, AReS initiates a single-pass interaction with the service API to prime an amenable local pre-trained encoder. This priming stage trains only a lightweight layer on top of the local encoder, making it highly receptive to the subsequent glass-box (white-box) reprogramming stage performed directly on the local model. Consequently, all subsequent adaptation and inference rely solely on this local proxy, eliminating all further API costs. Experiments demonstrate AReS's effectiveness where prior ZOO-based methods struggle: on GPT-4o, AReS achieves a +27.8% gain over the zero-shot baseline, a task where ZOO-based methods provide little to no improvement. Broadly, across ten diverse datasets, AReS outperforms state-of-the-art methods (+2.5% for VLMs, +15.6% for standard VMs) while reducing API calls by over 99.99%. AReS thus provides a robust and practical solution for adapting modern closed-box models.

Paper Structure

This paper contains 38 sections, 3 theorems, 19 equations, 6 figures, 19 tables.

Key Result

Lemma 1

(Lipschitz Continuity of Cross-Entropy Loss with respect to Logits). The cross-entropy loss function $\ell(z, y) = -\sum_{j=1}^{K^T} y_j \log(p_j(z))$, where $p_j(z)$ are softmax probabilities derived from logits $z \in \mathbb{R}^{K^T}$ and $y$ is a one-hot true label vector, is Lipschitz continuou for any two logit vectors $z_1, z_2$. These constants do not explicitly depend on the number of cla

Figures (6)

  • Figure 1: (a) On the real-world GPT-4o API, ZOO-based methods show limited effectiveness, providing little to no improvement over zero-shot performance while incurring high total (training and inference) costs. (b, c) On CLIP ViT-B/16, these methods require $\sim$10$^8$ API calls and over 32 hours of training on Flowers102, yet still underperform ours, AReS, which uses only $\sim$10$^3$ API calls and 0.4 hours.
  • Figure 2: (a) Previous closed-box methods use Zeroth-Order Optimization (ZOO), which estimates gradients by querying the API multiple times for the same training image with random perturbations. This results in continuous API dependency: numerous calls during training and one API call per image during inference. (b) Our AReS approach performs a single-pass priming, requiring only one API call per training image to prepare a local model. This enables efficient, gradient-aware VR locally, eliminating all subsequent API costs during inference.
  • Figure 3: Ablation studies for AReS on the EuroSAT dataset: impact of (a) training data fraction (for VMs), (b) few-shot sample size (for VLMs), (c) priming loss function, and (d) amount of extra unlabeled data for priming on accuracy (%). Default setting involves 16-shot learning, CLIP ViT-B/16 Service, and ViT-B/16 Local encoder unless otherwise indicated by the subplot analysis.
  • Figure 4: Illustration of Zeroth-Order Optimization (ZOO) techniques commonly used in closed-box model reprogramming. (a) BAR with Randomized Gradient-Free (RGF) method estimates gradients by querying the model with random directional perturbations. (b) BlackVIP with Simultaneous Perturbation Stochastic Approximation with Gradient Correction (SPSA-GC) approximates gradients using only two model queries with a randomly generated perturbation vector. Both approaches suffer from high query complexity and noisy gradient estimates, leading to unstable and computationally intensive optimization, especially for high-dimensional prompts.
  • Figure 5: Comparison of closed-box visual reprogramming approaches. (a) BAR and (b) BlackVIP rely on Zeroth-Order Optimization (ZOO) by repeatedly querying the service model with perturbed inputs (e.g., using random directions) to estimate gradients for updating the visual prompt. These methods suffer from high API call costs and potentially unstable optimization. (c) Our AReS method performs a one-time priming from the service model to a local model. Subsequent visual prompt optimization occurs efficiently on this local model using glass-box gradients, eliminating further API calls and enabling stable, cost-effective adaptation.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Remark : Priming vs. Knowledge Distillation
  • Definition 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 3
  • proof