Table of Contents
Fetching ...

Can LLMs Learn New Concepts Incrementally without Forgetting?

Junhao Zheng, Shengjie Qiu, Qianli Ma

TL;DR

This work investigates whether large language models can learn new concepts incrementally without forgetting. It introduces Concept-1K, an instance-level, 1,023-concept benchmark designed to minimize data leakage and enable fine-grained analysis of memorization and generalization across many incremental steps. Through extensive experiments across backbones and IL strategies, it finds persistent catastrophic forgetting, limited efficacy of in-context learning and LoRA, and that data replay is the most effective mitigation, with larger models, bigger buffers, and more pretraining improving IL. The study provides a robust benchmark and practical guidance for advancing IL in LLMs, highlighting that concrete concepts are easier to learn than abstract ones.

Abstract

Large Language Models (LLMs) have achieved remarkable success across various tasks, yet their ability to learn incrementally without forgetting remains underexplored. Incremental learning (IL) is crucial as it enables models to acquire new knowledge while retaining previously learned information, akin to human learning. Existing benchmarks for IL are insufficient due to data leakage issues and the overqualification of LLMs. To address these challenges, we introduce Concept-1K, a novel dataset comprising 1,023 recently emerged concepts across diverse domains. The concepts in Concept-1K are discrete, interpretable units of knowledge that allow for fine-grained analysis of learning and forgetting processes. Using Concept-1K as a testbed, we aim to answer the question: ``Can LLMs learn new concepts incrementally without forgetting like humans?'' Our investigation reveals that LLMs still suffer from catastrophic forgetting and that LoRA, despite fine-tuning fewer parameters, may lead to more forgetting on training data. Additionally, we explore the roles of in-context learning, model scale, buffer size, and pretraining in IL performance. These findings highlight the strengths and limitations of LLMs in IL scenarios and provide a robust benchmark for future research.

Can LLMs Learn New Concepts Incrementally without Forgetting?

TL;DR

This work investigates whether large language models can learn new concepts incrementally without forgetting. It introduces Concept-1K, an instance-level, 1,023-concept benchmark designed to minimize data leakage and enable fine-grained analysis of memorization and generalization across many incremental steps. Through extensive experiments across backbones and IL strategies, it finds persistent catastrophic forgetting, limited efficacy of in-context learning and LoRA, and that data replay is the most effective mitigation, with larger models, bigger buffers, and more pretraining improving IL. The study provides a robust benchmark and practical guidance for advancing IL in LLMs, highlighting that concrete concepts are easier to learn than abstract ones.

Abstract

Large Language Models (LLMs) have achieved remarkable success across various tasks, yet their ability to learn incrementally without forgetting remains underexplored. Incremental learning (IL) is crucial as it enables models to acquire new knowledge while retaining previously learned information, akin to human learning. Existing benchmarks for IL are insufficient due to data leakage issues and the overqualification of LLMs. To address these challenges, we introduce Concept-1K, a novel dataset comprising 1,023 recently emerged concepts across diverse domains. The concepts in Concept-1K are discrete, interpretable units of knowledge that allow for fine-grained analysis of learning and forgetting processes. Using Concept-1K as a testbed, we aim to answer the question: ``Can LLMs learn new concepts incrementally without forgetting like humans?'' Our investigation reveals that LLMs still suffer from catastrophic forgetting and that LoRA, despite fine-tuning fewer parameters, may lead to more forgetting on training data. Additionally, we explore the roles of in-context learning, model scale, buffer size, and pretraining in IL performance. These findings highlight the strengths and limitations of LLMs in IL scenarios and provide a robust benchmark for future research.
Paper Structure (48 sections, 3 equations, 11 figures, 24 tables)

This paper contains 48 sections, 3 equations, 11 figures, 24 tables.

Figures (11)

  • Figure 1: The illustration of the proposed Concept-1K. LLMs suffer from catastrophic forgetting when learning new concepts while humans do not.
  • Figure 2: The step-wise performance on Concept-1K. The backbone model is LLaMa-2-7B.
  • Figure 3: Comparison of the performance between full finetuning and LoRA on (a) the training set and (b) the test set. The height represents relative performance.
  • Figure 4: The analysis of memorization (top row) and generalization (bottom row) accuracy on Concept-1K. The backbone model is in {Pythia-70M, 160M, 410M, 1B, 1.4B, 2.8B}. The pretraining step is in {0, 16, 128, 1000, 10000, 143000 (final version)}. Each point represents the result of IL. The detailed results with standard deveriation are provided in Table \ref{['tab:scale_buffer_train']}, \ref{['tab:scale_buffer_test']}, \ref{['tab:pretrain_scale_buf0_train']}, \ref{['tab:pretrain_scale_buf0_test']}, \ref{['tab:pretrain_scale_buf2000_train']}, \ref{['tab:pretrain_scale_buf2000_test']}, \ref{['tab:pretrain_scale_buf20000_train']}, and \ref{['tab:pretrain_scale_buf20000_test']}.
  • Figure 5: The memorization accuracy and generalization accuracy of different concepts in Concept-1K. The concepts are sorted according to (a) memorization accuracy and (b) generalization accuracy respectively.
  • ...and 6 more figures