A Comparative Analysis of LLM Adaptation: SFT, LoRA, and ICL in Data-Scarce Scenarios

Bernd Bohnet; Rumen Dangovski; Kevin Swersky; Sherry Moore; Arslan Chaudhry; Kathleen Kenealy; Noah Fiedel

A Comparative Analysis of LLM Adaptation: SFT, LoRA, and ICL in Data-Scarce Scenarios

Bernd Bohnet, Rumen Dangovski, Kevin Swersky, Sherry Moore, Arslan Chaudhry, Kathleen Kenealy, Noah Fiedel

TL;DR

This study benchmarks three LLM adaptation paradigms—In-Context Learning (ICL), Supervised Finetuning (SFT), and Low-Rank Adaptation (LoRA)—in data-scarce scenarios using a single base model, Gemma-3. It demonstrates that ICL preserves existing knowledge but struggles with complex skills, SFT achieves rapid skill acquisition yet suffers severe catastrophic forgetting, and LoRA delivers a practical balance by enabling skill learning while largely preserving prior knowledge. The work analyzes hyperparameter effects, especially LoRA rank and data count, and shows that LoRA updates are highly layer-specific, concentrating in upper layers to minimize disruption of pre-trained representations. The findings offer actionable guidance on choosing adaptation strategies based on data availability and the importance of knowledge retention, with LoRA emerging as a robust middle-ground for many real-world, data-limited tasks.

Abstract

The remarkable capabilities of Large Language Models (LLMs) often need to be tailored for specific applications, requiring the integration of new knowledge or the acquisition of new skills. While full fine-tuning is a powerful adaptation method, it is computationally expensive and can lead to a degradation of general reasoning abilities, a phenomenon known as catastrophic forgetting. A range of alternative techniques exists, each with its own trade-offs. In-Context Learning (ICL) is fast but limited by context length, while Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) offer a middle ground by minimizing parameter changes. However, the challenge of catastrophic forgetting persists, raising questions about the best adaptation strategy for a given task. This paper presents a comparative analysis of Supervised Finetuning (SFT), LoRA, and ICL in data-scarce scenarios. We find that LoRA provides the most effective balance, successfully instilling new skills with minimal impact on the base model's general knowledge. In contrast, while SFT excels at skill acquisition, it is highly susceptible to catastrophic forgetting. ICL is effective for incorporating factual knowledge but struggles with complex skills. Our findings offer a practical framework for selecting an LLM adaptation strategy. We highlight the critical distinction between skill acquisition and knowledge integration, clarify the trade-offs between task-specific performance and the preservation of general capabilities.

A Comparative Analysis of LLM Adaptation: SFT, LoRA, and ICL in Data-Scarce Scenarios

TL;DR

Abstract

A Comparative Analysis of LLM Adaptation: SFT, LoRA, and ICL in Data-Scarce Scenarios

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)