Can LLMs Learn New Concepts Incrementally without Forgetting?
Junhao Zheng, Shengjie Qiu, Qianli Ma
TL;DR
This work investigates whether large language models can learn new concepts incrementally without forgetting. It introduces Concept-1K, an instance-level, 1,023-concept benchmark designed to minimize data leakage and enable fine-grained analysis of memorization and generalization across many incremental steps. Through extensive experiments across backbones and IL strategies, it finds persistent catastrophic forgetting, limited efficacy of in-context learning and LoRA, and that data replay is the most effective mitigation, with larger models, bigger buffers, and more pretraining improving IL. The study provides a robust benchmark and practical guidance for advancing IL in LLMs, highlighting that concrete concepts are easier to learn than abstract ones.
Abstract
Large Language Models (LLMs) have achieved remarkable success across various tasks, yet their ability to learn incrementally without forgetting remains underexplored. Incremental learning (IL) is crucial as it enables models to acquire new knowledge while retaining previously learned information, akin to human learning. Existing benchmarks for IL are insufficient due to data leakage issues and the overqualification of LLMs. To address these challenges, we introduce Concept-1K, a novel dataset comprising 1,023 recently emerged concepts across diverse domains. The concepts in Concept-1K are discrete, interpretable units of knowledge that allow for fine-grained analysis of learning and forgetting processes. Using Concept-1K as a testbed, we aim to answer the question: ``Can LLMs learn new concepts incrementally without forgetting like humans?'' Our investigation reveals that LLMs still suffer from catastrophic forgetting and that LoRA, despite fine-tuning fewer parameters, may lead to more forgetting on training data. Additionally, we explore the roles of in-context learning, model scale, buffer size, and pretraining in IL performance. These findings highlight the strengths and limitations of LLMs in IL scenarios and provide a robust benchmark for future research.
