Table of Contents
Fetching ...

The Future of Continual Learning in the Era of Foundation Models: Three Key Directions

Jack Bell, Luigi Quarantiello, Eric Nuertey Coleman, Lanpei Li, Malio Li, Mauro Madeddu, Elia Piccoli, Vincenzo Lomonaco

TL;DR

The paper argues that continual learning remains essential in the era of foundation models, addressing knowledge staleness, domain adaptation, and long-term task composition. It outlines three directions—Continual Pre-Training, Continual Fine-Tuning, and notably Continual Compositionality & Orchestration—as the path forward, with CCO envisioned as a scalable, decentralised, modular paradigm that can dynamically orchestrate multiple agents and modules. The authors review current CPT and CFT approaches, their challenges, and prospective solutions, and advocate a shift toward a decentralized ecosystem of continually evolving models rather than a single monolithic FM. This paradigm has implications for sustainability, data privacy, and democratic access, potentially enabling more rapid and responsible AI development. Overall, continual learning is presented as foundational for the next AI paradigm, enabling resilient, adaptive, and human-aligned systems through modular collaboration and ongoing knowledge integration.

Abstract

Continual learning--the ability to acquire, retain, and refine knowledge over time--has always been fundamental to intelligence, both human and artificial. Historically, different AI paradigms have acknowledged this need, albeit with varying priorities: early expert and production systems focused on incremental knowledge consolidation, while reinforcement learning emphasised dynamic adaptation. With the rise of deep learning, deep continual learning has primarily focused on learning robust and reusable representations over time to solve sequences of increasingly complex tasks. However, the emergence of Large Language Models (LLMs) and foundation models has raised the question: Do we still need continual learning when centralised, monolithic models can tackle diverse tasks with access to internet-scale knowledge? We argue that continual learning remains essential for three key reasons: (i) continual pre-training is still necessary to ensure foundation models remain up to date, mitigating knowledge staleness and distribution shifts while integrating new information; (ii) continual fine-tuning enables models to specialise and personalise, adapting to domain-specific tasks, user preferences, and real-world constraints without full retraining, avoiding the need for computationally expensive long context-windows; (iii) continual compositionality offers a scalable and modular approach to intelligence, enabling the orchestration of foundation models and agents to be dynamically composed, recombined, and adapted. While continual pre-training and fine-tuning are explored as niche research directions, we argue it is continual compositionality that will mark the rebirth of continual learning. The future of AI will not be defined by a single static model but by an ecosystem of continually evolving and interacting models, making continual learning more relevant than ever.

The Future of Continual Learning in the Era of Foundation Models: Three Key Directions

TL;DR

The paper argues that continual learning remains essential in the era of foundation models, addressing knowledge staleness, domain adaptation, and long-term task composition. It outlines three directions—Continual Pre-Training, Continual Fine-Tuning, and notably Continual Compositionality & Orchestration—as the path forward, with CCO envisioned as a scalable, decentralised, modular paradigm that can dynamically orchestrate multiple agents and modules. The authors review current CPT and CFT approaches, their challenges, and prospective solutions, and advocate a shift toward a decentralized ecosystem of continually evolving models rather than a single monolithic FM. This paradigm has implications for sustainability, data privacy, and democratic access, potentially enabling more rapid and responsible AI development. Overall, continual learning is presented as foundational for the next AI paradigm, enabling resilient, adaptive, and human-aligned systems through modular collaboration and ongoing knowledge integration.

Abstract

Continual learning--the ability to acquire, retain, and refine knowledge over time--has always been fundamental to intelligence, both human and artificial. Historically, different AI paradigms have acknowledged this need, albeit with varying priorities: early expert and production systems focused on incremental knowledge consolidation, while reinforcement learning emphasised dynamic adaptation. With the rise of deep learning, deep continual learning has primarily focused on learning robust and reusable representations over time to solve sequences of increasingly complex tasks. However, the emergence of Large Language Models (LLMs) and foundation models has raised the question: Do we still need continual learning when centralised, monolithic models can tackle diverse tasks with access to internet-scale knowledge? We argue that continual learning remains essential for three key reasons: (i) continual pre-training is still necessary to ensure foundation models remain up to date, mitigating knowledge staleness and distribution shifts while integrating new information; (ii) continual fine-tuning enables models to specialise and personalise, adapting to domain-specific tasks, user preferences, and real-world constraints without full retraining, avoiding the need for computationally expensive long context-windows; (iii) continual compositionality offers a scalable and modular approach to intelligence, enabling the orchestration of foundation models and agents to be dynamically composed, recombined, and adapted. While continual pre-training and fine-tuning are explored as niche research directions, we argue it is continual compositionality that will mark the rebirth of continual learning. The future of AI will not be defined by a single static model but by an ecosystem of continually evolving and interacting models, making continual learning more relevant than ever.

Paper Structure

This paper contains 18 sections, 1 equation, 1 figure.

Figures (1)

  • Figure 1: In (a), we see a base model pre-trained on video data is Continually Pre-trained on general corpora spanning different modalities (e.g. audio, images and text). Then in (b), this base model is Continually Fine-tuned over time, resulting in specialised fine-tuning (FT) modules trained on domain-specific datasets, such as medical texts. Finally in (c)---looking at model inference, an orchestrator routes a user’s query through the appropriate FT modules and combines their outputs into a single response. User inputs may change over time, as may the configurations by which models are composed, the processes through which models evolve via Continual Fine-tuning, and the introduction of new models as they become available.