Self-Specialization: Uncovering Latent Expertise within Large Language Models

Junmo Kang; Hongyin Luo; Yada Zhu; Jacob Hansen; James Glass; David Cox; Alan Ritter; Rogerio Feris; Leonid Karlinsky

Self-Specialization: Uncovering Latent Expertise within Large Language Models

Junmo Kang, Hongyin Luo, Yada Zhu, Jacob Hansen, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid Karlinsky

TL;DR

Self-Specialization demonstrates that latent domain expertise exists within general large language models and can be carved out with minimal supervision. By seeding domain-specific demonstrations and generating domain-tailored instructions and responses, then fine-tuning with LoRA, the approach produces domain-specialized models that outperform their base counterparts and even larger, generally aligned baselines in biomedical and financial tasks. The method maintains cross-task generalization, requires only a small synthetic dataset (around 5K examples), and can optionally leverage retrieval to inject external domain knowledge. While some tasks show limitations, overall results indicate a practical, data- and compute-efficient path to domain specialization in LLMs, with implications for rapid deployment across expert domains.

Abstract

Recent works have demonstrated the effectiveness of self-alignment in which a large language model is aligned to follow general instructions using instructional data generated from the model itself starting from a handful of human-written seeds. Instead of general alignment, in this work, we focus on self-alignment for expert domain specialization (e.g., biomedicine, finance). As a preliminary, we quantitively show the marginal effect that generic instruction-following training has on downstream expert domains' performance. To remedy this, we propose self-specialization - allowing for effective model specialization while achieving cross-task generalization by leveraging only a few labeled seeds. Self-specialization offers a data- and parameter-efficient way of "carving out" an expert model out of a generalist pre-trained LLM. Exploring a variety of popular open large models as a base for specialization, our experimental results in both biomedical and financial domains show that our self-specialized models outperform their base models by a large margin, and even larger models that are generally instruction-tuned or that have been adapted to the target domain by other means.

Self-Specialization: Uncovering Latent Expertise within Large Language Models

TL;DR

Abstract

Paper Structure (58 sections, 2 equations, 8 figures, 14 tables)

This paper contains 58 sections, 2 equations, 8 figures, 14 tables.

Introduction
Preliminaries: Benchmarking Existing Aligned Models
Self-Specialization
Seed Demonstrations
Domain-Specific Instruction Generation
Domain-Specific Response Generation
Triggering Specialization
Iterative Self-Specialization
Experimental Settings
Datasets.
Models.
Metrics.
Implementation Details.
Results and Analyses
Comparison with Baselines
...and 43 more sections

Figures (8)

Figure 1: Self-specialization concept. Expertise in various domains is mixed and latent within base LLMs, and can be carved out through self-specialization.
Figure 2: Self-Specialization overview. (a) We start with a small set of human-authored domain-specific seed instructions. The base model crafts synthetic instructions and corresponding input contexts tailored to that particular domain. Subsequently, during the response generation phase, responses are curated given the generated instruction and input pairs, optionally enhanced by infusing domain-relevant knowledge obtained via a retrieval component or iterative re-generation via our previous self-specialized model. Finally, in the specialization phase, the base model is tuned for specialization (w/ QLoRA) to uncover its target domain expertise. (b) Conceptually speaking, this process can be described as uncovering latent expertise within LLMs.
Figure 3: Comparing (with $F_1$-Score, 5-shot) our self-specialized MPT-30B model to 65B models in biomedicine.
Figure 4: Results in biomedicine using LLaMA-2 7B as a base model, and comparisons with other baselines including the one pre-trained on a huge domain-specific corpus. Scores are averaged over 10 datasets, and when in-context examples are involved, we use 5 different sets of demonstrations to report macro-averaged results and variances (SD) with error bars.
Figure 5: Analysis with the varied number of self-generated data for specialization. 0-shot averaged results with # generated data = {0, 100, 500, 1000, 5000, 10000} are shown.
...and 3 more figures

Self-Specialization: Uncovering Latent Expertise within Large Language Models

TL;DR

Abstract

Self-Specialization: Uncovering Latent Expertise within Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)