BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

Haotian Sun; Yuchen Zhuang; Wei Wei; Chao Zhang; Bo Dai

BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

Haotian Sun, Yuchen Zhuang, Wei Wei, Chao Zhang, Bo Dai

TL;DR

BBox-Adapter tackles the challenge of adapting black-box LLMs without access to internal parameters or output probabilities by training a lightweight 0.1–0.3B parameter adapter within an energy-based framework. It uses a ranking-based Noise Contrastive Estimation loss and an online adaptation loop that updates the adapter with positive feedback from ground-truth or AI/human signals, while negatives come from prior adapted inferences. Inference integrates black-box LLM outputs with adapter scores in a sentence-level beam search, enabling effective task-specific adaptation without API fine-tuning. Empirically, it delivers up to 6.77% accuracy gains across four domains and reduces training and inference costs by up to 31.30x and 1.84x, respectively, while supporting plug-and-play transfer to other black-box LLMs and preserving privacy and transparency.

Abstract

Adapting state-of-the-art Large Language Models (LLMs) like GPT-4 and Gemini for specific tasks is challenging. Due to the opacity in their parameters, embeddings, and even output probabilities, existing fine-tuning adaptation methods are inapplicable. Consequently, adapting these black-box LLMs is only possible through their API services, raising concerns about transparency, privacy, and cost. To address these challenges, we introduce BBox-Adapter, a novel lightweight adapter for black-box LLMs. BBox-Adapter distinguishes target and source domain data by treating target data as positive and source data as negative. It employs a ranking-based Noise Contrastive Estimation (NCE) loss to promote the likelihood of target domain data while penalizing that of the source domain. Furthermore, it features an online adaptation mechanism, which incorporates real-time positive data sampling from ground-truth, human, or AI feedback, coupled with negative data from previous adaptations. Extensive experiments demonstrate BBox-Adapter's effectiveness and cost efficiency. It improves model performance by up to 6.77% across diverse tasks and domains, while reducing training and inference costs by 31.30x and 1.84x, respectively.

BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

TL;DR

Abstract

Paper Structure (35 sections, 18 equations, 10 figures, 10 tables, 1 algorithm)

This paper contains 35 sections, 18 equations, 10 figures, 10 tables, 1 algorithm.

Introduction
Categorization of LLM Adaptation
Method
Black-Box LLM Adaptation as EBM
Adapter Update
Adapted Inference
Online Adaptation
Experiments
Experiment Setup
Main Results
Plug-and-Play Adaptation
Cost Analysis
Ablation Study: Effect of Ranking-based NCE Loss
Scale Analysis
Extension on White-box Adaptation
...and 20 more sections

Figures (10)

Figure 1: Illustration of white-box, grey-box, and black-box LLM adaptation. White-box has complete access to both model parameters and output probabilities, grey-box has access only to output probabilities, and black-box lacks access to both. indicates the models with trainable parameters, whereas indicates the inaccessible fixed parameters.
Figure 2: Overview of BBox-Adapter for black-box LLM adaptation from the source to the target domain. BBox-Adapter adopts an online adaptation framework, iteratively sampling from previous inferences and updating the adapter.
Figure 3: Scale analysis on StrategyQA with (a) different beam sizes and (b) different iterations of online adaptation. Both experiments are conducted with two-shot prompting.
Figure 4: Case study of BBox-Adapter on GSM8K. For the given question, the CoT solution from original gpt-3.5-turbo is incorrect, while the model adapted using BBox-Adapter successfully executed a logical, step-by-step search, ultimately yielding the correct answer. For visualization, we display only top-3 candidate answers at each step.
Figure 5: Loss curve of Azure-SFT on (a) StrategyQA, (b) TruthfulQA, and (c) ScienceQA datasets.
...and 5 more figures

BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

TL;DR

Abstract

BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)