Distributed Continual Learning
Long Le, Marcel Hussing, Eric Eaton
TL;DR
Distributed Continual Learning (DCL) studies a network of heterogeneous agents that sequentially encounter tasks and must exchange knowledge under budgets and topologies. The authors formalize DCL on a directed graph $\\mathcal{G}$ with a cumulative objective to minimize the total expected loss across tasks while constraining knowledge transfer with $b_{ij}$, $f_{ij}$, and a global clock $C$, and they compare data-instance, full-model, and modular parameter sharing, including a reusable-module approach called modmod. Empirical results across MNIST variants and CIFAR-100 show that modular parameter sharing accelerates early learning and reduces communication, data sharing yields strong final accuracy on easier tasks, and combining modalities delivers the best overall performance under realistic budgets. The work provides robust baselines, highlights the hidden costs of communication, and broadens the evaluation framework for DCL with heterogeneous agents, pointing toward extensions to reinforcement learning and more complex transfer strategies.
Abstract
This work studies the intersection of continual and federated learning, in which independent agents face unique tasks in their environments and incrementally develop and share knowledge. We introduce a mathematical framework capturing the essential aspects of distributed continual learning, including agent model and statistical heterogeneity, continual distribution shift, network topology, and communication constraints. Operating on the thesis that distributed continual learning enhances individual agent performance over single-agent learning, we identify three modes of information exchange: data instances, full model parameters, and modular (partial) model parameters. We develop algorithms for each sharing mode and conduct extensive empirical investigations across various datasets, topology structures, and communication limits. Our findings reveal three key insights: sharing parameters is more efficient than sharing data as tasks become more complex; modular parameter sharing yields the best performance while minimizing communication costs; and combining sharing modes can cumulatively improve performance.
