Table of Contents
Fetching ...

In-context Continual Learning Assisted by an External Continual Learner

Saleh Momeni, Sahisnu Mazumder, Zixuan Ke, Bing Liu

TL;DR

This work tackles the scalability and performance limitations of in-context continual learning (ICL) for class-incremental NLP by introducing InCA, which couples an External Continual Learner (ECL) with in-context prompts. The ECL builds Gaussian representations for each class using SBERT embeddings of semantic tags and usesMahalanobis distance to select a compact top-$k$ set of candidate classes, which then informs the LLM-based final prediction via per-class summaries. Importantly, InCA remains replay-free and does not update any LLM parameters, mitigating catastrophic forgetting while avoiding excessive prompt length. Across four datasets, InCA outperforms traditional fine-tuning baselines and remains competitive under data-constrained scenarios, demonstrating the practical value of combining an external, non-training-based classifier with in-context learning for scalable CIL.

Abstract

Existing continual learning (CL) methods mainly rely on fine-tuning or adapting large language models (LLMs). They still suffer from catastrophic forgetting (CF). Little work has been done to exploit in-context learning (ICL) to leverage the extensive knowledge within LLMs for CL without updating any parameters. However, incrementally learning each new task in ICL necessitates adding training examples from each class of the task to the prompt, which hampers scalability as the prompt length increases. This issue not only leads to excessively long prompts that exceed the input token limit of the underlying LLM but also degrades the model's performance due to the overextended context. To address this, we introduce InCA, a novel approach that integrates an external continual learner (ECL) with ICL to enable scalable CL without CF. The ECL is built incrementally to pre-select a small subset of likely classes for each test instance. By restricting the ICL prompt to only these selected classes, InCA prevents prompt lengths from becoming excessively long, while maintaining high performance. Experimental results demonstrate that InCA significantly outperforms existing CL baselines, achieving substantial performance gains.

In-context Continual Learning Assisted by an External Continual Learner

TL;DR

This work tackles the scalability and performance limitations of in-context continual learning (ICL) for class-incremental NLP by introducing InCA, which couples an External Continual Learner (ECL) with in-context prompts. The ECL builds Gaussian representations for each class using SBERT embeddings of semantic tags and usesMahalanobis distance to select a compact top- set of candidate classes, which then informs the LLM-based final prediction via per-class summaries. Importantly, InCA remains replay-free and does not update any LLM parameters, mitigating catastrophic forgetting while avoiding excessive prompt length. Across four datasets, InCA outperforms traditional fine-tuning baselines and remains competitive under data-constrained scenarios, demonstrating the practical value of combining an external, non-training-based classifier with in-context learning for scalable CIL.

Abstract

Existing continual learning (CL) methods mainly rely on fine-tuning or adapting large language models (LLMs). They still suffer from catastrophic forgetting (CF). Little work has been done to exploit in-context learning (ICL) to leverage the extensive knowledge within LLMs for CL without updating any parameters. However, incrementally learning each new task in ICL necessitates adding training examples from each class of the task to the prompt, which hampers scalability as the prompt length increases. This issue not only leads to excessively long prompts that exceed the input token limit of the underlying LLM but also degrades the model's performance due to the overextended context. To address this, we introduce InCA, a novel approach that integrates an external continual learner (ECL) with ICL to enable scalable CL without CF. The ECL is built incrementally to pre-select a small subset of likely classes for each test instance. By restricting the ICL prompt to only these selected classes, InCA prevents prompt lengths from becoming excessively long, while maintaining high performance. Experimental results demonstrate that InCA significantly outperforms existing CL baselines, achieving substantial performance gains.

Paper Structure

This paper contains 16 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the InCA framework. The diagram depicts the stages of generating semantic tags for the input, identifying the most similar classes via the ECL, and constructing the prediction prompt with class summaries, which together enables efficient in-context continual learning without retaining any training data.
  • Figure 2: Comparison of recall between our ECL and the text retriever (TR) at various values of $k$. The ECL operates without storing any replay data (buffer size = 0), while the TR maintains a buffer of instances to retrieve the most similar ones during inference. For the TR, we evaluate performance across different buffer sizes. When the TR's buffer size is zero, we store the embeddings of class summaries for retrieval, rather than training instances.
  • Figure 3: Accuracy of InCA and ECL recall at different $k$ values. The solid portion of each bar represents the accuracy of the model using in-context learning (ICL) with the top $k$ classes retrieved by the ECL. The leftmost column ($k$=1) represents the accuracy of ECL alone, where the most similar class is predicted without ICL. The dashed region indicates cases where the correct label is within the top $k$ classes retrieved by the ECL but the model's prediction is incorrect. Therefore, the total height of each bar (solid plus dashed) represents the ECL's recall of the correct classes at that $k$ value.
  • Figure 4: Performance comparison of InCA and JOINT fine-tuning across different data sizes. InCA demonstrates robust performance with limited data, particularly excelling over the fine-tuned model in data-constrained situations.