BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models

Haitao Li; Qingyao Ai; Jia Chen; Qian Dong; Zhijing Wu; Yiqun Liu; Chong Chen; Qi Tian

BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models

Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Zhijing Wu, Yiqun Liu, Chong Chen, Qi Tian

TL;DR

BLADE introduces a hybrid framework that augments black-box LLMs with a small domain-specific LM to better handle vertical domains like law and medicine without full model fine-tuning. The method combines Domain-specific Pre-training, Knowledge Instruction Tuning, and Bayesian Prompted Optimization to encode domain knowledge, generate instruction-aligned knowledge, and align small-LM outputs with the broader LLM. Empirical results on legal and medical benchmarks show BLADE outperforms continuous pre-training and retrieval-augmented baselines across multiple models, with robustness across languages and task types. The work highlights a cost-effective, modular approach to domain adaptation that preserves the reasoning strengths of general LLMs while injecting precise, domain-specific knowledge.

Abstract

Large Language Models (LLMs) like ChatGPT and GPT-4 are versatile and capable of addressing a diverse range of tasks. However, general LLMs, which are developed on open-domain data, may lack the domain-specific knowledge essential for tasks in vertical domains, such as legal, medical, etc. To address this issue, previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. Unfortunately, these strategies are either cost-intensive or unreliable in practical applications. To this end, we present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models. BLADE consists of a black-box LLM and a small domain-specific LM. The small LM preserves domain-specific knowledge and offers specialized insights, while the general LLM contributes robust language comprehension and reasoning capabilities. Specifically, our method involves three steps: 1) pre-training the small LM with domain-specific data, 2) fine-tuning this model using knowledge instruction data, and 3) joint Bayesian optimization of the general LLM and the small LM. Extensive experiments conducted on public legal and medical benchmarks reveal that BLADE significantly outperforms existing approaches. This shows the potential of BLADE as an effective and cost-efficient solution in adapting general LLMs for vertical domains.

BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models

TL;DR

Abstract

Paper Structure (25 sections, 8 equations, 4 figures, 7 tables)

This paper contains 25 sections, 8 equations, 4 figures, 7 tables.

Introduction
Related Work
Large Language Models
Domain adaptation of LLMs
Method
Overview
Domain-specific Pre-training (DP)
Knowledge Instruction Tuning (KIT)
Bayesian Prompted Optimization (BPO)
Experiment
Datasets and Metrics
Baselines
General LLMs
Legal-specific LLMs
Medical-specific LLMs
...and 10 more sections

Figures (4)

Figure 1: Comparison of the workflow of BLADE with existing domain adaptation methods. There are three steps in BLADE: (1) Domain-specific Pre-training imparts domain knowledge to the small LM. (2) Knowledge Instruction Tuning, which enhances the small LM's ability to follow instructions, thereby sharpening its capacity to produce precise, question-specific knowledge. (3) Bayesian Prompted Optimization contributes to aligning the output of small LM with the comprehension of black-box LLM.
Figure 2: The process of generating data for Knowledge Instruction Tuning. Only knowledge that can help the black-box LLM correctly answer a question is reserved.
Figure 3: Illustration of the Bayesian Prompted Optimization where only soft embeddings are trainable. $F(\boldsymbol{p})$ is the objective score corresponding to soft embedding $\boldsymbol{p}$. In each iteration, the derivative-free optimizer explores new soft embedding based on previous evaluation scores. The knowledge prompt is consistent with the instruction used in the Prompt-based Knowledge Generation stage.
Figure 4: Comparison of retrieved knowledge with that generated by BLADE.

BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models

TL;DR

Abstract

BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)